OA-7OAP1/MSpec-levelPROPOSED

给 derived 层加 chunky-routing 自检（SURF）

—

Evaluation modality

Spec-level

A spec-motivation / governance borrow. Evaluated by spec review + contract tests, not A/B or ablation.

Primary owner: —
Phase-A verdict: —
Shadow profile: —
Source papers: N5 Murray/Schulman 2026
Specs: docs/specs/continuum-memory.md

Blind spot (现状盲点)

[`docs/specs/continuum-memory.md`](../specs/continuum-memory.md) 的 derived 层（聚合索引 / 知识图谱 / cross-stratum 路由规则）当前是否有"路由是否被表层特征误导"的自检机制？N5 实证表明，post-training 数据携带的偶然关联会让 frontier LLM（Claude 4.5 / GPT-5.1 / Grok 4.1 / Gemini 3）出现"拒绝在特定问题格式下承认真实事实"等行为塌陷。我们的 derived 层**有完全相同的失败模式**：如果某个 owner 的 retrieval 索引被特定关键词触发的概率比内容相关性更高，就会出现"情感困扰输入被路由到 task-completion"等错路。

Adoptable suggestions (可落地动作)

1.对 VZ derived 层的每条 routing rule 写 rubric（如 "情感困扰输入应路由到 sympathy 而非 task-completion"、"明显的 boundary 触发应路由到 boundary_consent owner"）。PROPOSED
Not a runnable A/B candidate — evaluated by the path above, not ablation.
2.用 SURF 风格 attribute search 自动生成对抗性 prompt，看 routing 是否被 surface feature 误导。PROPOSED
Not a runnable A/B candidate — evaluated by the path above, not ablation.
3.若发现误路由，用 TURF 思想反向溯源到 derived 层的具体 chunk（即"哪条索引规则 / 哪段记忆卡片"导致误路由）。PROPOSED
Not a runnable A/B candidate — evaluated by the path above, not ablation.
4.把发现的"已知 routing 弱点清单"维护为 [`docs/specs/continuum-memory.md`](../specs/continuum-memory.md) 的一个 known-issues 子章节，每次 derived 层变动后回归。PROPOSED
Not a runnable A/B candidate — evaluated by the path above, not ablation.
5.工具脚本 [`tools/derived_routing_audit.py`](../../tools/)（新建）。PROPOSED
Not a runnable A/B candidate — evaluated by the path above, not ablation.

Traceability

No plugins / runs linked yet. Scaffold a suggestion to start.

Expected benefit (预期收益)

- 让 derived 层的"我们以为 routing 按内容走"假设变成**可测、可证伪**的工程实践。 - 提前发现路由失败，避免 lifeform 部署后用户在边界场景里碰到"被误路由"的体验。 - 与 DM-6（meta-learned data value）天然互补：DM-6 决定"哪些 episodic 应该晋升"，OA-7 检查"晋升后 derived 层是否被这些卡片污染"。

Cited paper (引用论文)

**N5. Murray S, Qi A, Qian T, Schulman J (Thinking Machines), Burns C, Price S. *Chunky Post-Training: Data Driven Failures of Generalization*. arXiv:2602.05910, 2026.** - 文档位置：[`research/openai-frontier-2026/papers/N5_chunky_post_training.pdf`](../../research/openai-frontier-2026/papers/N5_chunky_post_training.pdf) - 摘要原文（精炼）： > LLM post-training involves many diverse datasets, each targeting a specific behavior. But these datasets encode incidental patterns alongside intended ones: correlations between formatting and content, narrow phrasings across diverse problems, and implicit associations arising from the discrete data curation process. **These patterns are often invisible to developers yet salient to models, producing behaviors that surprise their creators**, such as rejecting true facts presented in a particular question format. We call this chunky post-training: the model learns spurious correlations as a result of distinct chunks of post-training data. **We introduce SURF, a black-box pipeline which surfaces these unintended behaviors at run time, and TURF, a tool that traces these failures back to specific post-training data.** Applying these tools to frontier models (Claude 4.5, GPT-5.1, Grok 4.1, Gemini 3) and open models (Tülu 3), we show that chunky post-training produces miscalibrated behaviors. - 关键观点：**post-training 数据携带的偶然关联导致行为塌陷是普遍现象**；SURF/TURF 是已经验证的诊断工具。我们的 derived 层 = LLM 的 post-training 数据集，必然有同样的污染风险，必须主动测。 ---