← Back to Blog

Day 92: Errors That Tell the Truth

There's a recurring theme in this project — a UI that stops lying. Day 57 was a dashboard that stopped claiming green when it wasn't; Day 73 was a chat that stopped claiming a turn succeeded when it had errored. Today's version is smaller and more specific: an error bubble that was technically pointing at the wrong culprit, and a day spent making errors say what's actually true.

The Honest Error (PR #400, 07:55)

Gemini 3, run through the OpenAI-compatible path on Ollama Cloud, has a habit of dropping the thought_signature field on a tool call. When that happens the upstream returns an error, and Pinchy was rendering it with copy that said the provider had "rejected the request schema or tool payload." That's misleading in a way that costs the user real time: it reads like your agent is misconfigured, so people go hunting through their tool settings and model choice for a problem that isn't there. The truth is narrower — it's a transient upstream format hiccup, and retrying usually clears it on the next attempt.

The fix is a new classifier, classifyUpstreamFormatError, that runs orthogonally to the existing 5xx model-error classifier. When it detects a thought_signature error it shows a bubble that names the cause and tells the user to just hit Retry — and pointedly does not offer a "Switch model" button, because the model isn't broken. I'd looked at simply bumping the OpenClaw pin: upstream fixed this in OpenClaw 2026.5.18, but the changelog shows the fix covers only the native Google provider path, and Pinchy reproduces the bug via the OpenAI-compat path, which is a different, still-open upstream issue. Bumping the pin alone wouldn't clear the user-visible bug, so the honest-error UI is the right layer to fix it at.

The other half of the PR is measurement. Every thought_signature error now emits a throttled agent.upstream_format_error audit entry, keyed per agent and model with a 5-minute TTL — the same shape as the existing model-unavailable signal. That turns "how often does this actually happen?" from a grep through gateway logs into a one-line audit query, which is the precondition for the deferred decision about whether to auto-retry these on the user's behalf. You don't get to choose to automate something until you can measure how often it fires.

Build Once, Run Many (PR #401, 07:45)

CI had been quietly getting slower as the integration matrix grew. Seven downstream jobs — docker-smoke, the install and upgrade end-user flows, and the Odoo, web, email, and Telegram E2E suites — each rebuilt the Pinchy and OpenClaw images from scratch. Today's rebuild does it once: a single build-image job builds both images per push, pushes them to GHCR as pinchy-ci:sha-<sha>, and every downstream job pulls the prebuilt image instead of rebuilding. Layer caching warms restarts, Playwright's browser cache and Next's build cache get shared between runs, and a composite setup action replaces about ten duplicated setup blocks. Fork PRs — which can't push to GHCR with a read-only token — fall back to a local build, so the speedup doesn't break contributions from outside the org. None of this is user-facing, but slow CI is a tax on every change, and it compounds.

id vs. SKU (PR #402, 07:53)

An Odoo failure mode that's low effort to fix and high frequency in practice: a user asks about a product by its SKU — "WIDGET-12" — and the model searches by id (or the reverse). The query returns nothing, the model guesses, and the wrong record gets the downstream action. Odoo's numeric primary key (id) and its human-readable internal reference (default_code) look interchangeable to an LLM that doesn't know better. This PR makes the distinction impossible to miss at every layer the model reads — the compact schema annotates both fields inline when they co-occur, four read tools spell out the SKU-vs-URL-id rule in their descriptions, and the shared Odoo query instructions gain an identifier-disambiguation section. A drift guard enumerates every odoo-* template and asserts each carries the disambiguation, so a future template can't silently ship without it. It's model-agnostic — it helps every upstream LLM — which is the kind of fix I prefer: it costs almost nothing and it stops a whole class of confident wrong answers.

Ahead of the Cut

The rest of the day was release prep. An audit fix (PR #409, 11:27) keeps the success/error detail consistent with the recorded outcome when a tool result carries isError — a small consistency bug that would otherwise make the audit trail disagree with itself. And the v0.5.4 release notes and CHANGELOG pointer get finalised (PR #410, 11:47), which means tomorrow is a release day. The shape of the release is already set — Odoo operator templates, the schema split, the FK-lookup fixes, the chat-reliability edges — and finalising the notes the day before is the difference between a calm cut and a scramble.

Day 92

Three of today's four landings are invisible to a user reading a changelog — a faster CI, a more consistent audit detail, a disambiguation hint buried in tool descriptions. The fourth, the honest error, is the only one a user will actually see, and it's the one I care about most, because the cost of a lying error message isn't the error — it's the half hour someone spends fixing a problem that was never theirs.

← Day 91: The Default Gets Opinionated Day 93: v0.5.4 and the Cursor That Jumped →

Pinchy is open source and ready to deploy. Clone the repo, run docker compose up, and your first agent is live in minutes.