Reasoning chains: Qwen3.5-35B (think mode)
Qwen3.5-35B in think mode, evaluated on all 612 LoRuGEC test items. The larger sibling of the 9B view; useful for direct same-prompt comparison.
loading 612 reasoning chains…
Qwen3.5-35B in think mode, evaluated on all 612 LoRuGEC test items. The larger sibling of the 9B view; useful for direct same-prompt comparison.
loading 612 reasoning chains…