fix(mattermost): add WebSocket reconnection with exponential backoff (#14962)

* fix(mattermost): add WebSocket reconnection with exponential backoff

Fixes #13980

The Mattermost WebSocket monitor had no error handling around the
reconnection loop. When connectOnce() threw (e.g. 'fetch failed' from
network issues), the error propagated through the while loop, causing
the gateway to log 'channel exited' and never restart.

Extract runWithReconnect() utility that:
- Catches thrown errors from connectFn and retries
- Uses exponential backoff (2s→4s→8s→...→60s cap)
- Resets backoff after successful connections
- Stops cleanly on abort signal
- Reports errors and reconnect delays via callbacks

* fix(mattermost): make backoff sleep abort-aware and reject on WS connect failure

* fix(mattermost): clean up abort listener on normal timeout to prevent leak

* fix(mattermost): skip error reporting when abort causes connection rejection

* fix(mattermost): use try/finally for abort listener cleanup in connectOnce

* fix: force-close WebSocket on error to prevent reconnect hang

* fix: use ws.terminate() on abort for reliable teardown during CONNECTING state

* fix(mattermost): use initial retry delay for reconnect backoff

---------

Co-authored-by: Peter Steinberger <steipete@gmail.com>
This commit is contained in:
Marcus Castro
2026-02-13 23:10:22 -03:00
committed by GitHub
parent 9443c638f4
commit 2b154e0458
4 changed files with 299 additions and 71 deletions
+1
View File
@@ -27,6 +27,7 @@ Docs: https://docs.openclaw.ai
- Clawdock: avoid Zsh readonly variable collisions in helper scripts. (#15501) Thanks @nkelner.
- Discord: route autoThread replies to existing threads instead of the root channel. (#8302) Thanks @gavinbmoore, @thewilloftheshadow.
- Discord/Agents: apply channel/group `historyLimit` during embedded-runner history compaction to prevent long-running channel sessions from bypassing truncation and overflowing context windows. (#11224) Thanks @shadril238.
- Mattermost (plugin): retry websocket monitor connections with exponential backoff and abort-aware teardown so transient connect failures no longer permanently stop monitoring. (#14962) Thanks @mcaxtr.
- Telegram: scope skill commands to the resolved agent for default accounts so `setMyCommands` no longer triggers `BOT_COMMANDS_TOO_MUCH` when multiple agents are configured. (#15599)
- Plugins/Hooks: fire `before_tool_call` hook exactly once per tool invocation in embedded runs by removing duplicate dispatch paths while preserving parameter mutation semantics. (#15635) Thanks @lailoo.
- Agents/Image tool: cap image-analysis completion `maxTokens` by model capability (`min(4096, model.maxTokens)`) to avoid over-limit provider failures while still preventing truncation. (#11770) Thanks @detecti1.