SelfClean for Audio#

SelfClean-Audio is a Python library for automatically detecting common issues in audio datasets, such as near-duplicates, off-topic samples, and label errors. It provides strong baselines for each issue type, an end-to-end runner for experiments, and automatic aggregation of evaluation metrics.

This project aims to help researchers and practitioners improve the quality of their audio datasets by providing tools to identify and address potential problems.


html_theme.sidebar_secondary.remove: true#

SelfClean for Audio#

SelfClean-Audio detects dataset issues in audio (near-duplicates, off-topic samples, and label errors). It includes strong baselines, an end‑to‑end runner, and automatic aggregation of evaluation metrics.

  • Source: https://github.com/hslu-aai/selfclean-audio

  • Docs: https://hslu-aai.github.io/selfclean-audio

  • License: Apache 2.0

How the documentation is structured#

Documentation is split into four categories, also accessible from links in the top bar.