问题所在

来自 Andrej 的推文:

“模型会代你做错误假设,然后不假思索地执行。它们不管理自身的困惑,不寻求澄清,不呈现矛盾,不展示权衡,在应该提出异议时也不反驳。”

“它们真的很喜欢把代码和 API 搞复杂,堆砌抽象概念,不清理死代码……明明 100 行能搞定的事情,非要实现成 1000 行的臃肿架构。”

“它们有时仍会改动或删除自己理解不足的代码和注释,即使这些内容与任务本身无关。“

解决方案

四个原则,集中在一个文件中,直接解决这些问题:

原则解决什么问题
编码前思考错误假设、隐藏困惑、缺少权衡
简洁优先过度复杂、臃肿抽象
精准修改无关编辑、触碰不应碰的代码
目标驱动执行通过测试优先、可验证的成功标准

四个原则详解

1. 编码前思考

不要假设。不要隐藏困惑。呈现权衡。

LLM 经常默默选择一种解释然后执行。这个原则强制明确推理:

  • 明确说明假设 — 如果不确定,询问而不是猜测
  • 呈现多种解释 — 当存在歧义时,不要默默选择
  • 适时提出异议 — 如果存在更简单的方法,说出来
  • 困惑时停下来 — 指出不清楚的地方并要求澄清

2. 简洁优先

用最少的代码解决问题。不要过度推测。

对抗过度工程的倾向:

  • 不要添加要求之外的功能
  • 不要为一次性代码创建抽象
  • 不要添加未要求的”灵活性”或”可配置性”
  • 不要为不可能发生的场景做错误处理
  • 如果 200 行代码可以写成 50 行,重写它

检验标准: 资深工程师会觉得这过于复杂吗?如果是,简化。

3. 精准修改

只碰必须碰的。只清理自己造成的混乱。

编辑现有代码时:

  • 不要”改进”相邻的代码、注释或格式
  • 不要重构没坏的东西
  • 匹配现有风格,即使你更倾向于不同的写法
  • 如果注意到无关的死代码,提一下 —— 不要删除它

当你的改动产生孤儿代码时:

  • 删除因你的改动而变得无用的导入/变量/函数
  • 不要删除预先存在的死代码,除非被要求

检验标准: 每一行修改都应该能直接追溯到用户的请求。

4. 目标驱动执行

定义成功标准。循环验证直到达成。

将指令式任务转化为可验证的目标:

不要这样做…转化为…
”添加验证""为无效输入编写测试,然后让它们通过"
"修复 bug""编写重现 bug 的测试,然后让它通过"
"重构 X""确保重构前后测试都能通过”

对于多步骤任务,说明一个简短的计划:

1. [步骤] → 验证: [检查]
2. [步骤] → 验证: [检查]
3. [步骤] → 验证: [检查]

强有力的成功标准让 LLM 能够独立循环执行。弱标准(“让它工作”)需要不断澄清。

CLAUDE.md Template

# CLAUDE.md
 
Behavioral guidelines to reduce common LLM coding mistakes. Merge with project-specific instructions as needed.
 
**Tradeoff:** These guidelines bias toward caution over speed. For trivial tasks, use judgment.
 
## 1. Think Before Coding
 
**Don't assume. Don't hide confusion. Surface tradeoffs.**
 
Before implementing:
 
- State your assumptions explicitly. If uncertain, ask.
- If multiple interpretations exist, present them - don't pick silently.
- If a simpler approach exists, say so. Push back when warranted.
- If something is unclear, stop. Name what's confusing. Ask.
 
## 2. Simplicity First
 
**Minimum code that solves the problem. Nothing speculative.**
 
- No features beyond what was asked.
- No abstractions for single-use code.
- No "flexibility" or "configurability" that wasn't requested.
- No error handling for impossible scenarios.
- If you write 200 lines and it could be 50, rewrite it.
 
Ask yourself: "Would a senior engineer say this is overcomplicated?" If yes, simplify.
 
## 3. Surgical Changes
 
**Touch only what you must. Clean up only your own mess.**
 
When editing existing code:
 
- Don't "improve" adjacent code, comments, or formatting.
- Don't refactor things that aren't broken.
- Match existing style, even if you'd do it differently.
- If you notice unrelated dead code, mention it - don't delete it.
 
When your changes create orphans:
 
- Remove imports/variables/functions that YOUR changes made unused.
- Don't remove pre-existing dead code unless asked.
 
The test: Every changed line should trace directly to the user's request.
 
## 4. Goal-Driven Execution
 
**Define success criteria. Loop until verified.**
 
Transform tasks into verifiable goals:
 
- "Add validation" → "Write tests for invalid inputs, then make them pass"
- "Fix the bug" → "Write a test that reproduces it, then make it pass"
- "Refactor X" → "Ensure tests pass before and after"
 
For multi-step tasks, state a brief plan:
 
```
1. [Step] → verify: [check]
2. [Step] → verify: [check]
3. [Step] → verify: [check]
```
 
Strong success criteria let you loop independently. Weak criteria ("make it work") require constant clarification.
 
---
 
**These guidelines are working if:** fewer unnecessary changes in diffs, fewer rewrites due to overcomplication, and clarifying questions come before implementation rather than after mistakes.