← Back to Blog

Day 56: When the Other Side Disappears

A lot of the credibility of a chat interface lives in the unhappy path. The answer arrives, the answer is good, the user is happy — fine, that's table stakes. What actually decides whether someone trusts a tool enough to put it in front of colleagues is the moment something goes wrong. An agent that spins forever while the backend is silently dead is not a product. It's a trap.

Today was the unhappy path. OpenClaw — the runtime Pinchy talks to for every model call — can disappear mid-stream. The container restarts. The network flaps. The process dies and supervisord brings it back up a second later. From the user's point of view, none of that is observable. They just see a spinner that never resolves.

Three Failure Modes, Three Different Messages

The first thing was separating the failure modes. "Something went wrong" is not a product; it's an excuse. The chat now distinguishes three cases.

If the WebSocket between the browser and Pinchy disconnects during an active stream, the chat shows a disconnect error with a WifiOff icon. If OpenClaw quietly processed the message anyway, the next reconnect replaces the error with the real response from history — no duplicate submit, no "try again" button to fat-finger.

If there's no activity at all for 60 seconds — no tokens, no thinking heartbeats — the chat shows a timeout with a Clock icon and actionable text. This is different from a disconnect: the connection is fine, the model just isn't producing anything. Telling the user "disconnected" when the wire is healthy would be a small lie that erodes trust later.

If all ten automatic reconnect attempts fail, the chat shows a persistent banner. At that point the environment itself is sick, and a reassuring spinner would be actively dishonest.

Heartbeats That Stop Lying

One of the subtler bugs was a pathological case: OpenClaw's chat stream can hang before yielding any output — a container that came back up but didn't fully initialize, say. The server was dutifully sending a thinking heartbeat every fifteen seconds. Each heartbeat reset the client's stuck timer. The user stared at a spinner that, in theory, could spin forever.

The fix was to not start the heartbeat interval until the first real chunk arrives. One initial thinking frame goes out immediately for UI feedback — enough to reset the stuck timer once. If nothing follows, the timer fires naturally and the user sees a timeout instead of an eternal animation. Silence should not be indistinguishable from progress.

The Generator That Would Never Return

There's a related bug, one layer deeper. Pinchy's server iterates OpenClaw's chat stream with a for-await loop. When OpenClaw goes away, that generator's internal resolveChunk promise never resolves. The loop blocks forever. Heartbeats keep firing. The browser's stuck timer never triggers because the server is technically still "working."

Fix: when the OpenClaw client fires a disconnected event, Pinchy closes all active browser WebSockets. That trips the browser's existing onclose handler, which injects the disconnect error and auto-reconnects to Pinchy. Once OpenClaw recovers, the next message goes through normally.

Initial version of this had a subtle bug of its own: OpenClaw emits disconnected on every failed reconnect attempt, about once a second. So the first fix closed the browser WebSockets over and over, making it impossible for the user to even see the reconnect banner for more than a moment. Guard fixed that to one-shot per "down" period, rearmed when the next connected event fires.

Stale Handlers After Agent Switches

The last class of bug was about identity. When you switch agents mid-stream, a new WebSocket opens. The old one's onclose and onerror handlers are still queued to fire — and when they did, they injected a disconnect error into the new agent's chat window. From the user's point of view: "I switched to a different agent and it immediately showed me an error I never caused."

Fix: every handler captures the agent ID at connection time and bails out if it doesn't match the current one. Stale callbacks from previous connections now silently discard themselves.

Day 56

None of this is a new feature. There's nothing to demo. The whole day was spent making failure legible — so that when the environment misbehaves, the user finds out in words, not by giving up. That's a lot less exciting than shipping a template grid. It's also the difference between a chat product that works and a chat product that almost works.

← Day 55: Shared Memory, Different Boundaries

Pinchy is open source and ready to deploy. Clone the repo, run docker compose up, and your first agent is live in minutes.