chore: apply local workspace updates (#9911)

* chore: apply local workspace updates * fix: resolve prep findings after rebase (#9898) (thanks @gumadeiras) * refactor: centralize model allowlist normalization (#9898) (thanks @gumadeiras) * fix: guard model allowlist initialization (#9911) * docs: update changelog scope for #9911 * docs: remove model names from changelog entry (#9911) * fix: satisfy type-aware lint in model allowlist (#9911)
2026-06-28 21:01:43 +03:00 · 2026-02-05 16:54:44 -05:00
parent 93b450349f
commit 4629054403
72 changed files with 722 additions and 251 deletions
@@ -243,7 +243,7 @@ Even with strong system prompts, **prompt injection is not solved**. System prom
 - Run sensitive tool execution in a sandbox; keep secrets out of the agent’s reachable filesystem.
 - Note: sandboxing is opt-in. If sandbox mode is off, exec runs on the gateway host even though tools.exec.host defaults to sandbox, and host exec does not require approvals unless you set host=gateway and configure exec approvals.
 - Limit high-risk tools (`exec`, `browser`, `web_fetch`, `web_search`) to trusted agents or explicit allowlists.
- **Model choice matters:** older/legacy models can be less robust against prompt injection and tool misuse. Prefer modern, instruction-hardened models for any bot with tools. We recommend Anthropic Opus 4.5 because it’s quite good at recognizing prompt injections (see [“A step forward on safety”](https://www.anthropic.com/news/claude-opus-4-5)).
+- **Model choice matters:** older/legacy models can be less robust against prompt injection and tool misuse. Prefer modern, instruction-hardened models for any bot with tools. We recommend Anthropic Opus 4.6 (or the latest Opus) because it’s strong at recognizing prompt injections (see [“A step forward on safety”](https://www.anthropic.com/news/claude-opus-4-5)).

 Red flags to treat as untrusted: