Catalog
DM-3DMSpec-levelPROPOSED

regime trigger 用 interest function 替代 hard-coded

Evaluation modality

Spec-level

A spec-motivation / governance borrow. Evaluated by spec review + contract tests, not A/B or ablation.

Primary owner
Phase-A verdict
Shadow profile
Source papers
Khetarpal/Precup AAAI 2020(B9) + Chunduru/Precup 2022(A9)
Specs
docs/specs/cognitive-regime.mddocs/specs/emergent-action-abstraction.md

Blind spot (现状盲点)

[`docs/specs/cognitive-regime.md`](../specs/cognitive-regime.md) 的 regime(casual social / acquaintance building / emotional support / guided exploration / problem solving / repair and de-escalation)激活条件当前是 hard-coded rule 还是可学习的?如果是 hard-coded,违反 [`.cursor/rules/no-keyword-matching-hacks.mdc`](../../.cursor/rules/no-keyword-matching-hacks.mdc) 的"涌现优于硬编码"原则,且 R14 持久 regime 身份的 activation 决策无法从经验中改进。

Adoptable suggestions (可落地动作)

  1. 1.把 [`docs/specs/cognitive-regime.md`](../specs/cognitive-regime.md) 中 regime 的 initiation condition 改为可微 **interest function** $I_\omega(s) \in [0, 1]$,端到端学习;现有 hard-coded rule 在迁移期间保留作为 SHADOW 对照。PROPOSED

    Not a runnable A/B candidate — evaluated by the path above, not ablation.

  2. 2.在 [`docs/specs/emergent-action-abstraction.md`](../specs/emergent-action-abstraction.md) 加入 Attention Option-Critic 的 **degeneracy 防护机制**:(a) option domination 检测(某个 regime 占用率 > 阈值触发 alarm);(b) switching frequency 上限。PROPOSED

    Not a runnable A/B candidate — evaluated by the path above, not ablation.

  3. 3.评估证据先行:interest function 切换决策与 hard-coded rule 切换决策做"代际对比"(参考 §7),在 held-out 场景看哪个更接近人类标注的"该切 regime"判断。PROPOSED

    Not a runnable A/B candidate — evaluated by the path above, not ablation.

Traceability

No plugins / runs linked yet. Scaffold a suggestion to start.

Expected benefit (预期收益)

- R14 持久 regime 身份的 activation 从规则驱动变为**数据驱动**——契合"涌现优于硬编码"。 - 防止 option-critic 经典的塌缩失败模式(少数 regime 主导 + 频繁切换)。 - regime initiation 的可微性让 background-slow 反思可以**通过 PE 信号反向调整 interest function**,形成完整闭环。

Cited paper (引用论文)

**B9. Khetarpal K, Klissarov M, Chevalier-Boisvert M, Bacon P-L, Precup D. *Options of Interest: Temporal Abstraction with Interest Functions*. arXiv:2001.00271, AAAI 2020.** - 文档位置:[`research/papers/dm/options-of-interest-temporal-abstraction-interest-functions-2001.00271.pdf`](../../research/papers/dm/options-of-interest-temporal-abstraction-interest-functions-2001.00271.pdf) - 作者权重:Doina Precup(DeepMind Montreal + McGill,options 框架奠基人之一)。 - 摘要原文(精炼): > Temporal abstraction refers to the ability of an agent to use behaviours of controllers which act for a limited, variable amount of time. The options framework describes such behaviours as consisting of a subset of states in which they can initiate, an internal policy and a stochastic termination condition. However, much of the subsequent work on option discovery has ignored the initiation set, because of difficulty in learning it from data. We provide a generalization of initiation sets suitable for general function approximation, by defining an interest function associated with an option. We derive a gradient-based learning algorithm for interest functions, leading to a new interest-option-critic architecture. - 关键观点:把 options framework 中的 initiation set 一般化为可微的 interest function,端到端学习。完美对应我们要把"什么时候该激活某个 regime"做成可微学习对象的需求。 **A9. Chunduru R, Precup D. *Attention Option-Critic*. arXiv:2201.02628, 2022.** - 文档位置:[`research/papers/dm/attention-option-critic-2201.02628.pdf`](../../research/papers/dm/attention-option-critic-2201.02628.pdf) - 摘要原文(精炼): > Temporal abstraction in reinforcement learning is the ability of an agent to learn and use high-level behaviors, called options. The option-critic architecture provides a gradient-based end-to-end learning method to construct options. We propose an attention-based extension to this framework, which enables the agent to learn to focus different options on different aspects of the observation space. We show that this leads to behaviorally diverse options which are also capable of state abstraction, and prevents the degeneracy problems of option domination and frequent option switching that occur in option-critic. - 关键观点:明确指出 option-critic 会塌缩——这是我们必须知道的失败模式。attention 作为 state abstraction 工具 + degeneracy 防护机制是必要 inductive bias。 ---