selfclean_audio.datasets.gtzan#
Members
GTZAN dataset wrapper with built-in access to known data quality issues. |
- class selfclean_audio.datasets.gtzan.GTZANKnownIssuesDataset(root: str | Path, issue_type: str = 'duplicates', gt_duplicates_file: str | Path | None = None, gt_prep_file: str | Path | None = None, convert_mono: bool = True, sample_rate: int = 16000, target_duration_sec: float | None = 30.0, extensions: tuple[str, ...] = ('.wav', '.mp3', '.flac'))[source]#
GTZAN dataset wrapper with built-in access to known data quality issues.
Exposes audio samples from a local GTZAN folder (
genres/<class>/*.wav)Parses ground truth CSVs with known issues from external_code
- Provides
get_errors()compatible with SelfClean evaluation: For ISSUE_TYPE “duplicates”: returns (set of (idx_i, idx_j) pairs, [..labels..])
For ISSUE_TYPE “label_errors”: returns a list[int] of 0/1 per sample
- Provides