character-soul-bootstrap 加 value-prioritization-as-regime
Evaluation modality
Spec-levelA spec-motivation / governance borrow. Evaluated by spec review + contract tests, not A/B or ablation.
- Primary owner
- —
- Phase-A verdict
- —
- Shadow profile
- —
- Source papers
- N7 Zhang/Schulman/Durmus 2025
- Specs
- docs/specs/character-soul-bootstrap.mddocs/specs/cognitive-regime.md
Blind spot (现状盲点)
[`docs/specs/character-soul-bootstrap.md`](../specs/character-soul-bootstrap.md) 当前是否把 lifeform-* 适配器的 **value prioritization** 显式纳入 regime 状态的一部分?N7 实证揭示,不同 LLM 系统性 value 偏好显著不同(Claude 偏 ethical responsibility / Gemini 偏 emotional depth / OpenAI/Grok 偏 efficiency),且这些 value 偏好可被 prompt 临时操控。VZ 应该让 value prioritization 成为 R14 持久 regime 身份的一部分,而**不是 prompt-level character**——后者随时被 jailbreak 替换。
Adoptable suggestions (可落地动作)
- 1.在 [`docs/specs/character-soul-bootstrap.md`](../specs/character-soul-bootstrap.md) 加入 "value prioritization spec" 段落:每个 lifeform-* 适配器必须显式声明 value prioritization patterns(如:sympathy > efficiency / boundary_consent > task_completion / honest_refusal > pleasing)。PROPOSED
Not a runnable A/B candidate — evaluated by the path above, not ablation.
- 2.在 [`docs/specs/cognitive-regime.md`](../specs/cognitive-regime.md) 显式引用:value prioritization 进入 regime 持久身份(R14),不可被 user prompt 临时覆盖。PROPOSED
Not a runnable A/B candidate — evaluated by the path above, not ablation.
- 3.与 OA-5 VZ-Spec-Stress 工具串联:stress test 必须用 value prioritization 当 ground truth 来评估 owner trade-off 是否符合预期。PROPOSED
Not a runnable A/B candidate — evaluated by the path above, not ablation.
- 4.与 DM-3 (interest function regime trigger) 联动:interest function 的学习目标必须 reflect value prioritization,否则会学出"行为效率高但价值取向漂移"的 regime。PROPOSED
Not a runnable A/B candidate — evaluated by the path above, not ablation.
Traceability
No plugins / runs linked yet. Scaffold a suggestion to start.
Expected benefit (预期收益)
- 让"产品 = 关系不是智力"的产品立场在系统 invariant 层得到保证:value prioritization 是结构性的,不是装饰性的。 - 防止未来某个 jailbreak prompt 把 lifeform 的 value 临时改写为 "user said override → comply"。 - 给我们做 lifeform-* 适配器开发提供清晰的"value 必须在哪里、用什么形式声明"的工程契约。
Cited paper (引用论文)
**N7. Zhang J et al. *Stress-Testing Model Specs Reveals Character Differences Among Language Models*. arXiv:2510.07686, 2025.**(同 OA-5 引用) - 关键观点:N7 系统揭示**不同 LLM 在面临 value trade-off 时表现出系统性差异**,且这些差异源自 model spec 内部对 value 的不同优先级。我们要把"VZ + lifeform-X 的 value 优先级"上升为系统不变量,而不是 prompt 层的可变约束。 ---