Catalog
DM-6DMSpec-levelPROPOSED

episodic→persistent 晋升用 meta-learned data value

Evaluation modality

Spec-level

A spec-motivation / governance borrow. Evaluated by spec review + contract tests, not A/B or ablation.

Primary owner
Phase-A verdict
Shadow profile
Source papers
Calian/Schaul/Silver 2025 DataRater(A7)
Specs
docs/specs/continuum-memory.mddocs/specs/evidence_program.md

Blind spot (现状盲点)

[`docs/specs/continuum-memory.md`](../specs/continuum-memory.md) 的 episodic→persistent 晋升规则当前是 hard-coded heuristic(如"出现 N 次以上"或"PE 超过阈值"),还是可学习的?如果是 hard-coded,记忆质量随系统经验自动调整就无从谈起;ReflectionEngine 的 writeback 决策(哪些 episodic 卡片晋升、哪些 forget)本质是"data value estimation"问题。

Adoptable suggestions (可落地动作)

  1. 1.在 [`docs/specs/continuum-memory.md`](../specs/continuum-memory.md) 加入"meta-learned promotion criterion"小节:把"哪些 episodic 卡片晋升到 persistent / 哪些 forget"作为可学习的 data value head。PROPOSED

    Not a runnable A/B candidate — evaluated by the path above, not ablation.

  2. 2.**关键约束**:DataRater 原版是端到端 meta-gradient(违反 R2 冻结基底),我们要走 R2 兼容路径——可选 (a) bandit 在 retention rule 上的探索;(b) 离线 RL on memory traces;(c) gradient-free CMA-ES 在 retention head 上。PROPOSED

    Not a runnable A/B candidate — evaluated by the path above, not ablation.

  3. 3.评估证据先行:用既有 [`docs/specs/evidence_program.md`](../specs/evidence_program.md) 的 active-matched-control 范式跑"hard-coded heuristic vs meta-learned head"对比。PROPOSED

    Not a runnable A/B candidate — evaluated by the path above, not ablation.

Traceability

No plugins / runs linked yet. Scaffold a suggestion to start.

Expected benefit (预期收益)

- 让记忆质量随系统经验自动调整,而非依赖人工 heuristic。 - 给 R5/R6 的"记忆连续谱"一个**自适应的 stratum 边界**——边界本身从经验中学,而不是 spec 写死。 - 与 R8 SSOT 完全相容:`continuum_memory` owner 内部维护 retention head,对外仍只暴露 snapshot。

Cited paper (引用论文)

**A7. Calian D A, Farquhar G, Kemaev I, Zintgraf L M, Hessel M, Shar J, Oh J, György A, Schaul T, Dean J, van Hasselt H, Silver D (DeepMind). *DataRater: Meta-Learned Dataset Curation*. arXiv:2505.17895, 2025.** - 文档位置:[`research/papers/dm/datarater-meta-learned-dataset-curation-2505.17895.pdf`](../../research/papers/dm/datarater-meta-learned-dataset-curation-2505.17895.pdf) - 作者权重:DeepMind RL 顶配团队(Schaul, van Hasselt, Silver, Junhyuk Oh)。 - 摘要原文(精炼): > The quality of foundation models depends heavily on their training data. Consequently, great efforts have been put into dataset curation. Yet most approaches rely on manual tuning of coarse-grained mixtures of large buckets of data, or filtering by hand-crafted heuristics. An approach that is ultimately more scalable (let alone more satisfying) is to learn which data is actually valuable for training. ... Our proposed DataRater is an instance of this idea. It estimates the value of training on any particular data point. This is done by meta-learning using 'meta-gradients', with the objective of improving training efficiency on held out data. In extensive experiments across a range of model scales and datasets, we find that using our DataRater to filter data is highly effective, resulting in significantly improved compute efficiency. - 关键观点:数据的价值应该被**学出来**,不是手工定 heuristic。"让数据自己说出价值"是非常强的设计哲学,与我们"涌现优于硬编码"一致。我们采纳思想但不采纳实现路径(端到端 meta-gradient 与 R2 不兼容)。 ---