Catalog
EVO-5EVOP2/MSpec-levelPROPOSED

Proposer / verifier 不对称共演化(PAIRED 式 regret 引导场景库;evaluator 只读)

Evaluation modality

Spec-level

A spec-motivation / governance borrow. Evaluated by spec review + contract tests, not A/B or ablation.

Primary owner
Phase-A verdict
Shadow profile
Source papers
AlphaGeometry 2 + PAIRED UED
Specs
docs/specs/multi-timescale-learning.mddocs/specs/evaluation.md

Blind spot (现状盲点)

静态 benchmark 无法随 **当前系统弱点** 自适应施压;需要 **课程与环境侧压力** 的可演化来源,但不能把 evaluator 变成 **在线 reward**(R-PE / R12)。

Adoptable suggestions (可落地动作)

  1. 1.**场景 proposer**(生成会话挑战配置)与 **evaluator 只读族** **禁止**相互 fine-tune 基底。PROPOSED

    Not a runnable A/B candidate — evaluated by the path above, not ablation.

  2. 2.用 **PAIRED** 式 **regret**(或简化代理:held-out 上失败模式分布)更新 **场景 archive 的采样偏好**,指向 **高 regret** 区域。PROPOSED

    Not a runnable A/B candidate — evaluated by the path above, not ablation.

  3. 3.思想上对齐 **AlphaGeometry 2**:**验证器硬、提议器探**;实现上保持 **evaluator = readout**。PROPOSED

    Not a runnable A/B candidate — evaluated by the path above, not ablation.

Traceability

No plugins / runs linked yet. Scaffold a suggestion to start.

Expected benefit (预期收益)

- 开放任务上形成 **自适应压力测试**,而不引入 SIMA2 式 **LLM-as-reward-generator**(R-PE 冲突)。

Cited paper (引用论文)

**PAIRED** Dennis et al., arXiv:2012.02096。**AlphaGeometry 2** Trinh et al., arXiv:2502.03544(不对称 proposer/verifier 思想)。 ---