What Aggregate Scores Hide: Per-Rule Evaluation of Russian Grammatical Error Correction
Smirnova, Kopan, Makeev, Chernishev · ACL BEA 2026
A 98-category Rozental-grounded diagnostic surfaces what aggregate F0.5
hides: synthetic-data fine-tuning drives subordinate-clause comma accuracy from
14% to 1% while overall scores rise.