chore: apply local workspace updates (#9911)

* chore: apply local workspace updates

* fix: resolve prep findings after rebase (#9898) (thanks @gumadeiras)

* refactor: centralize model allowlist normalization (#9898) (thanks @gumadeiras)

* fix: guard model allowlist initialization (#9911)

* docs: update changelog scope for #9911

* docs: remove model names from changelog entry (#9911)

* fix: satisfy type-aware lint in model allowlist (#9911)
This commit is contained in:
Gustavo Madeira Santana
2026-02-05 16:54:44 -05:00
committed by GitHub
parent 93b450349f
commit 4629054403
72 changed files with 722 additions and 251 deletions
+1 -1
View File
@@ -243,7 +243,7 @@ Even with strong system prompts, **prompt injection is not solved**. System prom
- Run sensitive tool execution in a sandbox; keep secrets out of the agents reachable filesystem.
- Note: sandboxing is opt-in. If sandbox mode is off, exec runs on the gateway host even though tools.exec.host defaults to sandbox, and host exec does not require approvals unless you set host=gateway and configure exec approvals.
- Limit high-risk tools (`exec`, `browser`, `web_fetch`, `web_search`) to trusted agents or explicit allowlists.
- **Model choice matters:** older/legacy models can be less robust against prompt injection and tool misuse. Prefer modern, instruction-hardened models for any bot with tools. We recommend Anthropic Opus 4.5 because its quite good at recognizing prompt injections (see [“A step forward on safety”](https://www.anthropic.com/news/claude-opus-4-5)).
- **Model choice matters:** older/legacy models can be less robust against prompt injection and tool misuse. Prefer modern, instruction-hardened models for any bot with tools. We recommend Anthropic Opus 4.6 (or the latest Opus) because its strong at recognizing prompt injections (see [“A step forward on safety”](https://www.anthropic.com/news/claude-opus-4-5)).
Red flags to treat as untrusted: