# Decision Records

This section documents the key research and development decisions made during the project.

We track decisions directly on this page for now. When formal decision records
are added, they will live under `docs/explanations/decisions/`.

## Audio Issue Categorization

We categorize audio quality issues at two levels:

- File Level: issues affecting the entire audio file.
  - Out-of-distribution audio
  - Duplicate files
  - Silent audio

- Segment (Audio) Level: issues affecting parts of the audio.
  - Looping artifacts
  - Gaussian noise
  - White noise
  - Background noise from other sources

To simulate these issues, we mix noise into the original audio using a
Signal-to-Noise Ratio (SNR) mixer adapted from our internal experiments.

## Embedding Pooling Strategy

Since SelfClean processes each sample independently and does not account for
temporal structure, our current approach treats any detected issue (e.g., white
noise) as a file-level problem. However, future work could explore more
fine-grained localization of issues within segments.

To better aggregate temporal information, we've implemented multiple pooling strategies for embedding generation:

```python
if pool == "CLS":  # Use class token
    emb = emb[:, 0, :]
elif pool == "Mean":  # Average pooling
    emb = emb.mean(dim=1)
elif pool == "Reshape":  # Flatten all tokens
    emb = emb.reshape(-1)
```

## Datasets Used

| Dataset  | Classes | Samples |  Duration | Sampling Rate |
| --------------- | --------------- | --------------- | --------------- | --------------- |
| ARCA23K | -- | 17,979 | 7.92 | 44100 |
| AudioSet20K | 527 | 39,436 | 9.89 | 32000 |
| Pianos | 8 | 668 | 4.86 | 16000|
| WMMS | 31 | 1695 | 10.42 | 16000|
| GTZAN | 10 | 930 | 30.02 | 22050|