Run LoRA Adaptation Grid#

This guide runs a LoRA fine-tuning sweep (unsupervised SimCLR/VICReg) to adapt the backbone’s representations to the target dataset, then evaluates SelfClean metrics.

Prerequisites#

Datasets configured in config/templates/file_template.py (ESC50_ROOT, ESC50_META, NOISE_ROOT).
Backbone weights available at paths in MODEL_PATHS.

Quick Start (Stage A)#

Baselines and a coarse LoRA grid over BEATs and EAT across duplicates, off-topic, and label errors at 10% corruption:

scripts/finetuning/run_lora_grid.sh

What It Does#

Writes runs to outputs/lora_grid/<model>_<issue>_frac<...>__....
Saves config.yaml and per-issue Score-*.csv compatible with scripts/collect_results.py.

Customize the Sweep (Environment Variables)#

MODELS: space-separated backbones to try (default: "beats eat").
ISSUES: detection tasks (default: "duplicates off_topic_noise label_errors").
FRACS: corruption fractions (default: "0.1").
OBJECTIVES: infonce vicreg.
RS: LoRA rank (default: "8 16").
ALPHAS: LoRA alpha; if empty uses {r, 2r}.
LRS: learning rates (Stage A default: "5e-5 1e-4 3e-4").
EPOCHS: adaptation epochs (Stage A: "1 3").
TEMPS: InfoNCE temperatures (Stage A: "0.07 0.2 0.5").
MAX_STEPS: cap on optimizer steps per run (default: 200).
EXTRA_OVERRIDES: additional dotlist overrides (default enables strong augs).

Examples#

# Smaller dev sweep
MODELS="beats" ISSUES="duplicates off_topic_noise" EPOCHS="1" MAX_STEPS=100 \
  scripts/finetuning/run_lora_grid.sh

# VicReg only, higher rank
OBJECTIVES="vicreg" RS="16" LRS="3e-4" EPOCHS="3" \
  scripts/finetuning/run_lora_grid.sh

# Add/modify augmentations
EXTRA_OVERRIDES="selfclean_audio.adapt_strong_aug=true selfclean_audio.adapt_eq_prob=0.5" \
  scripts/finetuning/run_lora_grid.sh

Aggregate Results#

python scripts/collect_results.py --base-dir outputs

Notes#

The saved config logs requested overrides and actual initialization details.
If PEFT is unavailable, the run falls back to frozen base with projection head only.
Use MAX_STEPS to keep wallclock manageable while comparing many configurations.