EVO-5EVOP2/MSpec-levelPROPOSED

Proposer / verifier 不对称共演化（PAIRED 式 regret 引导场景库；evaluator 只读）

—

Evaluation modality

Spec-level

A spec-motivation / governance borrow. Evaluated by spec review + contract tests, not A/B or ablation.

静态 benchmark 无法随 **当前系统弱点** 自适应施压；需要 **课程与环境侧压力** 的可演化来源，但不能把 evaluator 变成 **在线 reward**（R-PE / R12）。

1.**场景 proposer**（生成会话挑战配置）与 **evaluator 只读族** **禁止**相互 fine-tune 基底。PROPOSED
Not a runnable A/B candidate — evaluated by the path above, not ablation.
2.用 **PAIRED** 式 **regret**（或简化代理：held-out 上失败模式分布）更新 **场景 archive 的采样偏好**，指向 **高 regret** 区域。PROPOSED
Not a runnable A/B candidate — evaluated by the path above, not ablation.
3.思想上对齐 **AlphaGeometry 2**：**验证器硬、提议器探**；实现上保持 **evaluator = readout**。PROPOSED
Not a runnable A/B candidate — evaluated by the path above, not ablation.

No plugins / runs linked yet. Scaffold a suggestion to start.

- 开放任务上形成 **自适应压力测试**，而不引入 SIMA2 式 **LLM-as-reward-generator**（R-PE 冲突）。

**PAIRED** Dennis et al., arXiv:2012.02096。**AlphaGeometry 2** Trinh et al., arXiv:2502.03544（不对称 proposer/verifier 思想）。 ---