mirror of
https://github.com/farcasclaudiu/openclaw.git
synced 2026-06-29 09:02:02 +03:00
chore: Run pnpm format:fix.
This commit is contained in:
+31
-27
@@ -3,28 +3,31 @@ summary: "How inbound audio/voice notes are downloaded, transcribed, and injecte
|
||||
read_when:
|
||||
- Changing audio transcription or media handling
|
||||
---
|
||||
|
||||
# Audio / Voice Notes — 2026-01-17
|
||||
|
||||
## What works
|
||||
|
||||
- **Media understanding (audio)**: If audio understanding is enabled (or auto‑detected), OpenClaw:
|
||||
1) Locates the first audio attachment (local path or URL) and downloads it if needed.
|
||||
2) Enforces `maxBytes` before sending to each model entry.
|
||||
3) Runs the first eligible model entry in order (provider or CLI).
|
||||
4) If it fails or skips (size/timeout), it tries the next entry.
|
||||
5) On success, it replaces `Body` with an `[Audio]` block and sets `{{Transcript}}`.
|
||||
1. Locates the first audio attachment (local path or URL) and downloads it if needed.
|
||||
2. Enforces `maxBytes` before sending to each model entry.
|
||||
3. Runs the first eligible model entry in order (provider or CLI).
|
||||
4. If it fails or skips (size/timeout), it tries the next entry.
|
||||
5. On success, it replaces `Body` with an `[Audio]` block and sets `{{Transcript}}`.
|
||||
- **Command parsing**: When transcription succeeds, `CommandBody`/`RawBody` are set to the transcript so slash commands still work.
|
||||
- **Verbose logging**: In `--verbose`, we log when transcription runs and when it replaces the body.
|
||||
|
||||
## Auto-detection (default)
|
||||
|
||||
If you **don’t configure models** and `tools.media.audio.enabled` is **not** set to `false`,
|
||||
OpenClaw auto-detects in this order and stops at the first working option:
|
||||
|
||||
1) **Local CLIs** (if installed)
|
||||
1. **Local CLIs** (if installed)
|
||||
- `sherpa-onnx-offline` (requires `SHERPA_ONNX_MODEL_DIR` with encoder/decoder/joiner/tokens)
|
||||
- `whisper-cli` (from `whisper-cpp`; uses `WHISPER_CPP_MODEL` or the bundled tiny model)
|
||||
- `whisper` (Python CLI; downloads models automatically)
|
||||
2) **Gemini CLI** (`gemini`) using `read_many_files`
|
||||
3) **Provider keys** (OpenAI → Groq → Deepgram → Google)
|
||||
2. **Gemini CLI** (`gemini`) using `read_many_files`
|
||||
3. **Provider keys** (OpenAI → Groq → Deepgram → Google)
|
||||
|
||||
To disable auto-detection, set `tools.media.audio.enabled: false`.
|
||||
To customize, set `tools.media.audio.models`.
|
||||
@@ -33,6 +36,7 @@ Note: Binary detection is best-effort across macOS/Linux/Windows; ensure the CLI
|
||||
## Config examples
|
||||
|
||||
### Provider + CLI fallback (OpenAI + Whisper CLI)
|
||||
|
||||
```json5
|
||||
{
|
||||
tools: {
|
||||
@@ -46,16 +50,17 @@ Note: Binary detection is best-effort across macOS/Linux/Windows; ensure the CLI
|
||||
type: "cli",
|
||||
command: "whisper",
|
||||
args: ["--model", "base", "{{MediaPath}}"],
|
||||
timeoutSeconds: 45
|
||||
}
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
timeoutSeconds: 45,
|
||||
},
|
||||
],
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Provider-only with scope gating
|
||||
|
||||
```json5
|
||||
{
|
||||
tools: {
|
||||
@@ -64,34 +69,32 @@ Note: Binary detection is best-effort across macOS/Linux/Windows; ensure the CLI
|
||||
enabled: true,
|
||||
scope: {
|
||||
default: "allow",
|
||||
rules: [
|
||||
{ action: "deny", match: { chatType: "group" } }
|
||||
]
|
||||
rules: [{ action: "deny", match: { chatType: "group" } }],
|
||||
},
|
||||
models: [
|
||||
{ provider: "openai", model: "gpt-4o-mini-transcribe" }
|
||||
]
|
||||
}
|
||||
}
|
||||
}
|
||||
models: [{ provider: "openai", model: "gpt-4o-mini-transcribe" }],
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
### Provider-only (Deepgram)
|
||||
|
||||
```json5
|
||||
{
|
||||
tools: {
|
||||
media: {
|
||||
audio: {
|
||||
enabled: true,
|
||||
models: [{ provider: "deepgram", model: "nova-3" }]
|
||||
}
|
||||
}
|
||||
}
|
||||
models: [{ provider: "deepgram", model: "nova-3" }],
|
||||
},
|
||||
},
|
||||
},
|
||||
}
|
||||
```
|
||||
|
||||
## Notes & limits
|
||||
|
||||
- Provider auth follows the standard model auth order (auth profiles, env vars, `models.providers.*.apiKey`).
|
||||
- Deepgram picks up `DEEPGRAM_API_KEY` when `provider: "deepgram"` is used.
|
||||
- Deepgram setup details: [Deepgram (audio transcription)](/providers/deepgram).
|
||||
@@ -104,6 +107,7 @@ Note: Binary detection is best-effort across macOS/Linux/Windows; ensure the CLI
|
||||
- CLI stdout is capped (5MB); keep CLI output concise.
|
||||
|
||||
## Gotchas
|
||||
|
||||
- Scope rules use first-match wins. `chatType` is normalized to `direct`, `group`, or `room`.
|
||||
- Ensure your CLI exits 0 and prints plain text; JSON needs to be massaged via `jq -r .text`.
|
||||
- Keep timeouts reasonable (`timeoutSeconds`, default 60s) to avoid blocking the reply queue.
|
||||
|
||||
Reference in New Issue
Block a user