R13 压缩-强化交替作为一等工程程序,而非隐含原则
Evaluation modality
Spec-levelA spec-motivation / governance borrow. Evaluated by spec review + contract tests, not A/B or ablation.
- Primary owner
- —
- Phase-A verdict
- —
- Shadow profile
- —
- Source papers
- NL 2025 + Algorithm Distillation 2022 + MesaNet 2025
- Specs
- docs/specs/multi-timescale-learning.mddocs/specs/thinking-loop.mddocs/specs/evidence_program.md
Blind spot (现状盲点)
R13 在设计法则里非常核心:SSL 压缩 → RL 强化交替。但当前 26 条多是局部组件:DM-5 imagination、DM-6 data value、EVO-5 proposer/verifier。缺少一个把"压缩质量本身如何测、何时强化、强化是否只作用于压缩结构"写成工程程序的方向。
Adoptable suggestions (可落地动作)
- 1.在 [`docs/specs/multi-timescale-learning.md`](../specs/multi-timescale-learning.md) 增加 "compression-reinforcement alternation evidence" 小节。PROPOSED
Not a runnable A/B candidate — evaluated by the path above, not ablation.
- 2.将 thinking-loop / snapshot replay export 视为 SSL 压缩产物,定义压缩质量 readout:prediction improvement、owner snapshot stability、held-out reconstruction of semantic state。PROPOSED
Not a runnable A/B candidate — evaluated by the path above, not ablation.
- 3.强化阶段只允许作用于 controller / retention head / owner-internal policy,不允许绕到 Face token 或 substrate base weight。PROPOSED
Not a runnable A/B candidate — evaluated by the path above, not ablation.
Traceability
No plugins / runs linked yet. Scaffold a suggestion to start.
Expected benefit (预期收益)
- 把 R13 从原则变成可验收 pipeline。 - 让 background-slow 反思不仅产出文本 summary,而是产出可测的 compressed state。 - 防止后续学习工程退化为"哪里有 reward 就在哪里训"。
Cited paper (引用论文)
**Nested Learning**(2025)、**Algorithm Distillation**(2022)、**MesaNet**(2025)、**Mesa-Optimization in Transformers**(2023)。详见 [`research/core-author-paper-assessment-2026-05.md`](../../research/core-author-paper-assessment-2026-05.md) 与 [`research/probe/10_deep_synthesis_2026.md`](../../research/probe/10_deep_synthesis_2026.md) T4。 ---