# Visualization: ROC/PR/PRG + Annotation Effort Saved ## Inputs - Script: `scripts/visualize_full_curves.py` - Input: run folders under `outputs/` that contain `config.yaml`, `Score-*.csv`, and `Ranking-.csv` files. ## Generating Rankings - Rankings are saved automatically when running the CLI after 2025-09-14. - Each run writes `Ranking-near_duplicates.csv`, `Ranking-off_topic_samples.csv`, and `Ranking-label_errors.csv` containing a `target` column of 0/1 values ordered by the method's ranking. ## Create Figures - Example (BEATS at alpha 0.05): `python scripts/visualize_full_curves.py --base-dir outputs --alpha 0.05 --model BEATS` ## Batch for All Models and Alphas - Generate every figure the repo has results for: `python scripts/visualize_all_curves.py --base-dir outputs` - Optional: `--format pdf` to only save PDFs (PNGs are always created by the per-combo script alongside the PDF of the effort plot). ## What It Produces - For every combination that has results for all three issue types, it saves: - `curves__alpha_ND-<...>_OT-<...>_LE-<...>.png/.pdf` (ROC, PR, PRG, Effort Saved) - `Annotation_Effort_Saving__alpha_... .pdf/.png` (single panel) ## Notes - Only groups with all three issues are visualized. - If older runs are missing ranking CSVs, re-run the experiments to emit them. - If multiple variants exist per issue, the script prioritizes combined variants when present (e.g., `off_topic_combined` over `off_topic_noise/external/corrupted`, `combined_duplicates` over other duplicate corruptions).