Papers

What Aggregate Scores Hide: Per-Rule Evaluation of Russian Grammatical Error Correction

Smirnova, Kopan, Makeev, Chernishev · ACL BEA 2026

A 98-category Rozental-grounded diagnostic surfaces what aggregate F0.5 hides: synthetic-data fine-tuning drives subordinate-clause comma accuracy from 14% to 1% while overall scores rise.