Day 81: Saturday on Three Fronts
Saturday. The Saturday after a release is the day the team works on the things the release didn't reach: the features that needed a quiet block of time, the regression tests that needed a stable baseline, the upstream behaviour change nobody could have predicted. Today, three of those land in parallel and the commit log reads like three separate days stacked on one.
Images Stop Crashing the WebSocket
The first thread is image attachments, which had been technically working since the chat first supported them, but with a sharp edge: any image over ~1 MB tripped the WebSocket server's maxPayload ceiling (set defensively back in February against memory-exhaustion attacks) and closed the connection with status code 1009. The user saw a generic connection lost error in the UI. A modern phone camera is 5–10 MB per photo; the cliff was easy to hit accidentally and impossible to diagnose without reading the server log.
The fix is a two-sided rebalance. On the server, the WS payload limit rises from 1 MB to 25 MB, and the client image limit rises from 5 MB to 15 MB — both routed through a shared SERVER_WS_MAX_PAYLOAD_BYTES / CLIENT_MAX_IMAGE_SIZE_BYTES constant in @/lib/limits, so the server's WS setup and the client's pre-send check can't disagree silently. An invariant test asserts the server side is at least 1.5× the client cap, which prevents the next round of accidental skew. On the client, every image runs through browser-image-compression before being base64'd into the WebSocket frame: small JPEG/WebP files skip the compression step, PNGs are recompressed unconditionally, and the rest are converted to WebP. The compressor's target is 1.9 MB — chosen so it stays comfortably under OpenClaw 2026.4.27's hardcoded 2 MB inline-vs-offload threshold, past which the runtime turns the image into a text marker that doesn't reliably re-inline on the agent side. A chunked data-URL helper streams the encoding rather than allocating the whole base64 string at once. Anything that can't compress under the target fails closed at the client with a dedicated image too large error bubble before any frame leaves the page.
Two smaller fixes around the same code. The first: when the WS closes with code 1009, the client now resets its history state cleanly (cancels any pending ack timers, clears the in-flight turn) before re-rendering the error bubble. Without the reset, the next send would retry against a half-cleaned slate and confuse itself. The second: the close-code 1009 mapping into the user-facing error message is now image too large rather than the previous generic connection issue — the close code carries the diagnosis already, the UI just had to stop ignoring it.
The TDD shape of the PR is the one this codebase has settled into: a failing test that asserts close-code 1009 should surface 'Image too large' goes in first, the fix follows, a regression guard asserts no frames over the WS limit get the close-code 1009 at all (the client-side compression should have caught it first). The retryable assertion on the error type got tightened to toBeUndefined rather than the looser !retryable form, because retryable is a tri-state (true, false, unknown) and the wrong assertion was passing on unknown.
kimi-k2-thinking Goes Silent
The second thread arrived from upstream rather than from the codebase. ollama-cloud/kimi-k2-thinking — the default reasoning model on the curated reasoning tier — started returning HTTP 500s with empty bodies on Wednesday and continued through the week. The Ollama Cloud team's status page didn't mention it; the model itself is still listed as available; only the requests fail. It's the shape of a soft-deprecation, where the upstream is letting the model atrophy without an announcement, and the diagnosis from inside Pinchy is upstream silent 500 with no further signal.
The fix has two layers. The first is to drop kimi-k2-thinking from the curated allowlist and the resolver — agents that resolve via tier=reasoning now fall through to deepseek-v4-pro, which is already tested and configured. Agents that pinned the model explicitly continue to load (the DB record is still valid) but their next request will fail upstream, and they need a manual model-switch to recover. A regression guard asserts every model in the resolver targets a model that's still in the allowlist — the bug today was structurally a target outside the allowlist, and the guard prevents the same shape from re-emerging.
The second layer is the chat UX for the failure. Until today, a 5xx from the provider arrived in the chat as a raw HTTP 500: "Internal Server Error (ref: ...)" bubble — accurate but useless. The new shape is a structured switch model bubble: agent name, the provider/model that failed, a deep-link to the model picker in agent settings (with a #model URL fragment so the picker is scrolled into view), and a collapsible Technical details section with the raw upstream error and the support ref ID for the customer's bug report. The deep link's behaviour is asserted by an E2E test that clicks the link and verifies the picker is open at the right field; the deprecated pinned model shows up in the picker even though it's been removed from the curated list, so the user can see what they had and pick its replacement.
A matching server-side classifier landed: every 5xx from a provider runs through a small heuristic that says is this the model being unavailable, or is this something else? The classifier is provider-agnostic — it looks at the response shape, not the provider name — and emits an agent.model_unavailable audit event when it fires, throttled to one per 5 minutes per agent so a flaky provider can't flood the trail through user retries. A modelUnavailable hint plumbs through the client-router and ChatError into the UI, so the structured bubble has a structured signal to render against rather than guessing at error strings.
#199 Layer A Gets Its Test
The third thread is the matching test for yesterday's #199 fix. Yesterday's PR landed Layer B — keep draining OpenClaw after browser disconnect. Today's commits land Layer A: the regression guard that handleHistory uses the cache retry path rather than re-sending a greeting frame. The two layers are the same bug from opposite ends. Layer B is the cache being correctly populated. Layer A is the cache being correctly read when the next session starts. Without both, a disconnect followed by a reload would either (B) find nothing in the cache, or (A) find the cache but try to start a fresh greeting instead of resuming the message that was already there.
The Layer A test is a stronger version of the one that was already in place. The previous regression guard asserted handleHistory should not emit a greeting frame when a cached response exists; the new one asserts the positive form too — handleHistory should emit the cached response in full when one exists — which is the property the new disconnect drain actually depends on. An end-to-end spec wraps both layers against a fake-ollama fixture with a deliberately slow stream, so the disconnect-resume sequence runs in the same shape it would in production. The CI version of the test had to derive its sleep durations from shared constants rather than hard-coding a number, because the CI runner is slow enough that the original hard-coded waits raced. EPIPE on mid-stream disconnect is now handled cleanly rather than throwing into the test runner. A redundant greetingFrames filter in the Layer A guard got removed because it was hiding the case it was meant to assert against.
The Smaller Items
A handful of smaller things landed in the same window. The setup-provider's process.exit path got a regression guard — an earlier commit had accidentally turned a recoverable validation error into a hard exit, and the guard catches a future repeat. The onboarding system prompt now references tools by their registered prefixed names (pinchy-files.read, not read), which matters because the unprefixed names collide across plugins. fast-uri got bumped to 3.1.2 to patch CVE-2026-6321 and 6322 — the kind of dep bump that's small in code and important in posture. The Ollama Cloud silent-500 pattern got a short note in the LLM providers doc, so the next time this happens with a different model, the playbook is already written. The image-attachments docs picked up a guides page covering formats, limits, and vision support.
One small refusal landed in the openclaw-config writer: regenerate writes that shrink the config by more than 50% are now rejected. The catalyst was a near-miss where a partial state load was about to write a near-empty config over a working one — the safety check turns the bug into a noisy refusal rather than a quiet data loss. The threshold is conservative on purpose; the writer's healthy mode is to make small, incremental changes.
Day 81
Saturdays in this codebase have a shape now. They're the day the team picks up the threads that needed a long, uninterrupted block — three of them today. Image attachments stop being a sharp edge. A model the curated list could no longer rely on gets dropped, with the chat learning to surface the failure as a path forward rather than a wall. The #199 fix gets the matching half of its test, so the disconnect-resume property is asserted from both ends. None of these is a release on its own; together they're a release tomorrow.