Day 87: The Silent No-Op
Friday. The kind of day where one test going red turns out to expose a structural problem that no individual symptom had been pointing at. The morning starts with a new coverage guard. The afternoon ends with five plugin manifests fixed, a bidirectional drift guard, and three rounds of CI shakeout for the integration suite that now actually exercises tool dispatch.
The Coverage Guard
The shape of the gap. Pinchy plugins register tools with OpenClaw at startup: registerTool('pinchy_web_search', ...), registerTool('odoo_schema', ...), and so on. Each tool then gets dispatched at runtime when an agent's LLM call decides to use it. Pinchy's E2E suite has had per-plugin tests for a while — pinchy-files has a contract test, pinchy-odoo has a unit test, the agent-chat spec covers docs_list. None of those, until today, asserted that the tool actually dispatched end-to-end through OpenClaw: an agent thinks to call it, OpenClaw routes the call, the plugin handles the request, the result lands in the assistant's reply, the audit log records the dispatch.
The morning's first commit is a new test: plugin-tool-coverage.test.ts. It walks the list of registered Pinchy plugins, queries the audit log for tool-dispatch events tagged with each plugin's name across the most recent E2E run, and fails red if any plugin registered tools but never had one dispatched. The test is the kind that's small in code and clarifying in effect: it pins down the property we'd been assuming about every plugin without checking. pinchy-docs passed on the first run because docs_list was already covered by agent-chat.spec.ts. The five plugins that didn't — pinchy-files, pinchy-context, pinchy-odoo, pinchy-email, pinchy-web — were structurally untested at the dispatch layer.
The fake-Ollama Refactor
Closing the gap required a per-plugin dispatch probe. A probe is a small E2E spec that drives an agent against a fake-Ollama fixture, programs the fake LLM to call exactly one plugin tool, and then asserts the audit log records the dispatch. Each plugin needs its own probe because the probes share no behaviour beyond the tool was called — Odoo's probe needs a connected Odoo instance, email's probe needs the Gmail mock, web's probe needs the Brave mock.
The fake-Ollama fixture had been living inside the e2e/integration/ directory and was only consumable by the integration suite. The refactor moves it to e2e/shared/fake-ollama/ so the per-plugin suites can import it directly. The same commit extended the server with deterministic tool-call triggers, one per plugin: E2E_FILES_LS_TOOL → pinchy_ls, E2E_CONTEXT_SAVE_USER_TOOL → pinchy_save_user_context, E2E_ODOO_SCHEMA_TOOL → odoo_schema, E2E_EMAIL_LIST_TOOL → email_list, E2E_WEB_SEARCH_TOOL → pinchy_web_search. Setting one of these env vars on the fake-Ollama process tells it to emit a tool call for that tool on the next assistant turn; the probe doesn't have to model an LLM's reasoning to test dispatch.
With the fixture shared, five probes landed in the same morning. pinchy-files + pinchy-context went into the integration suite directly (because they share its Odoo-free baseline). pinchy-odoo got its probe in the Odoo E2E suite. pinchy-email got one in the email E2E suite (sitting alongside Saturday's Gmail mock from Day 76). pinchy-web got one in the web E2E suite. All five turned red on the first run, in the same shape but for slightly different reasons each.
The Silent No-Op
The shape that the five red probes had in common. The agent's LLM call did emit the tool call. OpenClaw received it. OpenClaw didn't dispatch it — instead returned an unknown tool error to the LLM, which the LLM ignored and produced a generic text reply. The audit log recorded the LLM call but no tool dispatch.
The root cause is OpenClaw 5.3's stricter dispatch contract. Until 5.3, registerTool() would register a tool by name and OpenClaw's dispatcher would route to it. In 5.3, registerTool() requires that the tool's name also be declared in the plugin manifest's contracts.tools field — a structural pre-declaration that lets OpenClaw know which tools the plugin claims to provide before the plugin code runs. Without that field, registerTool() silently no-ops and the tool is never exposed to the dispatcher. The team's dev OpenClaw had been running 4.x for the longest time and the manifests had simply never carried the field; the production OpenClaw (since Monday's bump to 5.3 in v0.5.3) was silently dropping every registration.
Nineteen tools across five plugins had been registering successfully on the team's machines and silently failing in production. Eight in pinchy-odoo (odoo_schema, odoo_read, odoo_count, odoo_aggregate, odoo_create, odoo_write, odoo_delete, odoo_attach_file). Five in pinchy-email. Two each in pinchy-context, pinchy-files, and pinchy-web. The customers who had upgraded to v0.5.3 since Monday and tried to use an Odoo write tool on their templates had been getting unknown tool errors that read as the LLM hallucinating a tool name. Nobody had filed a bug yet because the symptom looks like an LLM problem, not a Pinchy problem.
The fix is the matching field in five plugin manifests. pinchy-odoo's openclaw.plugin.json picks up contracts.tools: ["odoo_schema", "odoo_read", ...], listing the eight tools the plugin provides; the other four pick up their own lists. The pinchy-odoo field synced with the registry refactor that's still in flight (the odoo_schema rename is pending, but for today it stays under the old name to keep the v0.5.3 upgrade path working).
The Bidirectional Drift Guard
Fixing the five manifests would close today's gap, but the same bug could regress on the next plugin that lands. The drift guard that landed alongside is bidirectional. For each plugin, the test loads the manifest's contracts.tools list and the plugin's compiled registerTool() calls, asserts both lists contain the same names, and fails red on any divergence. A new tool added to the code without being declared in the manifest is now a red CI run. A name removed from the manifest without being removed from the code is also red. The shape of the bug — manifest and code disagree about which tools exist — is now structurally caught.
The AGENTS.md picked up a tool dispatch coverage recipe section in the same session. New plugins that register a tool now have a one-page recipe: declare it in contracts.tools, add a dispatch probe to the plugin's E2E suite, register a deterministic trigger in fake-Ollama. The recipe is the answer to how do we make sure the next plugin doesn't have today's bug.
Three Rounds of CI
Each round of CI surfaced a different cluster of failure modes the dispatch probes had been masking. Round one was a structural mismatch between the fake-Ollama fixture and the openai-completions path that Pinchy actually emits: the fake server only spoke /api/chat, so the external probes (odoo, email, web) 404'd against the /v1/chat/completions route OpenClaw was using. The fix adds the /v1/chat/completions and /v1/models endpoints with OpenAI-style SSE streaming and tool_calls support. Two adjacent fixes in the same round: a lastRoundHasToolResult helper that only inspects messages after the most recent user turn (the previous hasToolResult was over-broad, so stale tool messages from earlier tests in the same Smithers session were poisoning the pinchy-context probe), and a template switch for the pinchy-files probe from contract-analyzer (which carries capabilities: ["vision", "long-context", "tools"] and was 400'ing against the single non-vision model fake-Ollama exposes) to custom with explicit allowedTools wiring.
Round two surfaced three more shapes. The fake-Ollama's messageContent helper only handled string content; OpenAI/pi-ai send content as a parts array ({type:"text", text:"..."}), so the trigger string was never found and the server returned its default text response — the audit hook then never fired and the probe timed out. The fix flattens both forms. Separately, the pinchy-files probe stopped creating a new agent and reused Smithers instead: the integration suite bind-mounts /tmp/pinchy-integration-openclaw into OpenClaw as root while Pinchy's webServer runs on the host as a non-root user, so creating a new agent triggered an mkdir agents/<UUID>/agent that EACCES'd on the Pinchy side. Reusing Smithers's existing workspace sidesteps the directory-ownership trap entirely. The web probe got the same singleton-aware treatment — POST /api/integrations returns 409 on a second web-search connection, so the probe now reuses an existing one if any earlier test in the same spec left one behind, and only deletes on afterAll if it created its own.
Round three was the rate-limit round. The Odoo dispatch probe was failing with unknown agent id 30 seconds into the test: regenerateOpenClawConfig() fires pushConfigInBackground() fire-and-forget, the agent's create-then-PATCH flow queues two config.apply RPCs within milliseconds of each other, OpenClaw's config.apply rate limit (~3 per 45-second window) rejected the second one, and the rejected call fell through to the inotify-debounced file-watcher reload — by the time the debounce settled, the probe's 5-second consecutive-connected=true check had already returned and the chat had fired against an OpenClaw that didn't yet know about the new agent. The fix bumps waitForOpenClawStable to 30 seconds consecutive connected=true with a 90-second overall deadline, sized against the worst-case rate-limit + inotify-debounce window. The pinchy-files probe in the integration suite got a test.skip in the same commit, because the same docker bind-mount ownership trap from round two still affected it under the agent-recreate path, and reusing Smithers wasn't workable for the files probe specifically.
A second pinchy-odoo bug landed during the same window: strip quote-wrapped keys from create/write values. An LLM emitting a tool call sometimes wraps its JSON keys in quotes ('"name"': "Foo" instead of "name": "Foo") — most JSON parsers reject this, but the LLM's own response shape contains the wrapped form often enough that the agent's downstream code has to be lenient. The fix walks the values recursively (a follow-up commit landed in the evening to fix the recursion stopping at the top level) and strips the quote wrapping before passing to Odoo's RPC.
Day 87
The shape of today is the one that catches the most surface area. A coverage test that's small in code surfaces a class of bug — silent registration failures across five plugins, nineteen tools in total — that no individual symptom would have led to. The OpenClaw 5.3 contract change had landed on Monday and the team's dev environment had absorbed it without noticing. The customers who'd hit it had been blaming their LLM, not their plugin manifests. The bidirectional drift guard means the next time OpenClaw tightens its dispatch contract, the failure mode is a red CI run rather than a quiet production regression. The next stretch — v0.5.4 — will probably land with this pattern in place: every plugin tool is asserted to dispatch end-to-end, every manifest is asserted to agree with the code, and the questions that can the agent use this tool resolves to are structural rather than empirical.