Output formats¶

Synterr emits corrupted text in several formats. CLI selection is via the -f/--output-format flag on synterr generate.

CLI formats¶

Flag	When to use
`gector`	GECToR token-level edit tags (default). Compatible with the GECToR training format.
`tsv`	Parallel `src\ttgt` lines. Compatible with most seq2seq pipelines.
`jsonl`	Rich JSON per line: `src`, `tgt`, `errors[]` with type/category/schema_tag/schema_l2_tag.
`chat`	Instruction-tuning chat format (`messages: [...]`). For QLoRA / SFT fine-tuning of chat LLMs.
`sft`	Minimal `{src, tgt}` JSONL. Compatible with standard SFT trainers.

Python API¶

Beyond the CLI, the GeneratedResult object on pipeline.generate() exposes:

result.formatted — GECToR token-level tags (string).
result.to_tsv() — parallel src/tgt.
result.to_jsonl() — rich per-record JSON.
result.to_chat() — instruction-tuning chat format.
result.to_diff() — human-readable inline diff (CLI-unexposed).

Rule-targeted SFT¶

The separate synterr generate-targeted command writes {src, tgt, rule} JSONL with one line per LoRuGEC rule force-applied. A .dist.json sidecar records the per-rule generation count. See synterr.sft.generate_targeted for the Python API.