Catalog
OA-7OAP1/MSpec-levelPROPOSED

给 derived 层加 chunky-routing 自检(SURF)

Evaluation modality

Spec-level

A spec-motivation / governance borrow. Evaluated by spec review + contract tests, not A/B or ablation.

Primary owner
Phase-A verdict
Shadow profile
Source papers
N5 Murray/Schulman 2026
Specs
docs/specs/continuum-memory.md

Blind spot (现状盲点)

[`docs/specs/continuum-memory.md`](../specs/continuum-memory.md) 的 derived 层(聚合索引 / 知识图谱 / cross-stratum 路由规则)当前是否有"路由是否被表层特征误导"的自检机制?N5 实证表明,post-training 数据携带的偶然关联会让 frontier LLM(Claude 4.5 / GPT-5.1 / Grok 4.1 / Gemini 3)出现"拒绝在特定问题格式下承认真实事实"等行为塌陷。我们的 derived 层**有完全相同的失败模式**:如果某个 owner 的 retrieval 索引被特定关键词触发的概率比内容相关性更高,就会出现"情感困扰输入被路由到 task-completion"等错路。

Adoptable suggestions (可落地动作)

  1. 1.对 VZ derived 层的每条 routing rule 写 rubric(如 "情感困扰输入应路由到 sympathy 而非 task-completion"、"明显的 boundary 触发应路由到 boundary_consent owner")。PROPOSED

    Not a runnable A/B candidate — evaluated by the path above, not ablation.

  2. 2.用 SURF 风格 attribute search 自动生成对抗性 prompt,看 routing 是否被 surface feature 误导。PROPOSED

    Not a runnable A/B candidate — evaluated by the path above, not ablation.

  3. 3.若发现误路由,用 TURF 思想反向溯源到 derived 层的具体 chunk(即"哪条索引规则 / 哪段记忆卡片"导致误路由)。PROPOSED

    Not a runnable A/B candidate — evaluated by the path above, not ablation.

  4. 4.把发现的"已知 routing 弱点清单"维护为 [`docs/specs/continuum-memory.md`](../specs/continuum-memory.md) 的一个 known-issues 子章节,每次 derived 层变动后回归。PROPOSED

    Not a runnable A/B candidate — evaluated by the path above, not ablation.

  5. 5.工具脚本 [`tools/derived_routing_audit.py`](../../tools/)(新建)。PROPOSED

    Not a runnable A/B candidate — evaluated by the path above, not ablation.

Traceability

No plugins / runs linked yet. Scaffold a suggestion to start.

Expected benefit (预期收益)

- 让 derived 层的"我们以为 routing 按内容走"假设变成**可测、可证伪**的工程实践。 - 提前发现路由失败,避免 lifeform 部署后用户在边界场景里碰到"被误路由"的体验。 - 与 DM-6(meta-learned data value)天然互补:DM-6 决定"哪些 episodic 应该晋升",OA-7 检查"晋升后 derived 层是否被这些卡片污染"。

Cited paper (引用论文)

**N5. Murray S, Qi A, Qian T, Schulman J (Thinking Machines), Burns C, Price S. *Chunky Post-Training: Data Driven Failures of Generalization*. arXiv:2602.05910, 2026.** - 文档位置:[`research/openai-frontier-2026/papers/N5_chunky_post_training.pdf`](../../research/openai-frontier-2026/papers/N5_chunky_post_training.pdf) - 摘要原文(精炼): > LLM post-training involves many diverse datasets, each targeting a specific behavior. But these datasets encode incidental patterns alongside intended ones: correlations between formatting and content, narrow phrasings across diverse problems, and implicit associations arising from the discrete data curation process. **These patterns are often invisible to developers yet salient to models, producing behaviors that surprise their creators**, such as rejecting true facts presented in a particular question format. We call this chunky post-training: the model learns spurious correlations as a result of distinct chunks of post-training data. **We introduce SURF, a black-box pipeline which surfaces these unintended behaviors at run time, and TURF, a tool that traces these failures back to specific post-training data.** Applying these tools to frontier models (Claude 4.5, GPT-5.1, Grok 4.1, Gemini 3) and open models (Tülu 3), we show that chunky post-training produces miscalibrated behaviors. - 关键观点:**post-training 数据携带的偶然关联导致行为塌陷是普遍现象**;SURF/TURF 是已经验证的诊断工具。我们的 derived 层 = LLM 的 post-training 数据集,必然有同样的污染风险,必须主动测。 ---