← Back to Blog

Day 91: The Default Gets Opinionated

Yesterday was one cluster landing five times — every PR in the Odoo plugin. Today is the opposite shape: fifteen PRs merge, and they don't share a folder. What they share, if anything, is a question I'd been circling for a couple of weeks — what does the agent run on, and what is it allowed to touch? The morning answers the first half (model selection), the evening answers the second (what the agent can write and where its memory goes), and a couple of guardrails in between keep both halves honest.

The Default Gets Opinionated (PR #378, 13:32)

Until today, a freshly created Pinchy agent defaulted to the fast tier — Haiku, GPT-mini, Flash. That's the right default for a chatbot and the wrong one for an agent. An agent that reads an Odoo schema, plans a multi-step write, and reasons about which ledger account a vendor bill belongs to is doing work the fast tier wasn't built for, and the failure mode isn't an error — it's a confident wrong answer. So the default moves to balanced: Sonnet, GPT-5.5, Gemini-Pro. It's an opinionated change, and the opinion is that agent workloads are not chat workloads, so the out-of-the-box experience should be the tier that actually does the job.

The same PR fixed a bug that had been quietly handing OpenAI users the wrong model entirely. The model picker chooses the newest model that matches a tier by parsing the date out of the model id — and extractModelDate only understood Anthropic's YYYYMMDD format, not OpenAI's YYYY-MM-DD. Combined with an overly broad match pattern, that meant OpenAI provider setups could land on gpt-4o-mini when they should have gotten the current balanced model. The fix is five layers of hardening — a date parser that understands both formats, a reject pattern that filters preview/beta/thinking/nano variants, a deterministic tiebreaker, generation-anchored balanced patterns, and a curated per-provider fallback — plus a drift-guard test suite so the next new model doesn't silently re-open the hole. Smithers, the personal agent, gets pinned to balanced explicitly via a model hint, so a future change to the global default can't quietly downgrade him.

Crucially, none of this migrates existing agents. Pinchy only picks a model at create time; an agent you made last week keeps the model it was made with. The new default is for new agents, in keeping with the auto-default philosophy that's run through the project since Day 1 — choose well once, then don't surprise people.

Blocking the Broken-but-Shiny Model (PR #389, 15:00)

The flip side of picking a good default is refusing a bad one even when it looks attractive on paper. gemini-3-flash-preview on Ollama Cloud advertises a 1M-token context window and reasoning + vision + tools — exactly what an invoice-processing flow wants. It's also broken: it drops the thought_signature on tool calls and either errors or silently hangs. A new blocklist rule forbids any *-preview model when tools are required, and the Ollama-Cloud reasoning+vision default swaps to qwen3.5:397b — a 4× smaller context window (262K vs 1M), but one that actually works end to end. The shiny model stays selectable for anyone who explicitly overrides; it just stops being the thing Pinchy steers people into. A drift-guard test fails if any tier × task × forbidden-capability combination ever resolves back to a blocked model, so the next addition to the blocklist is covered without anyone remembering to write a test.

A Place to Write (PR #384, 21:27)

For months the agent's filesystem story was read-only: it could see files a user uploaded, but it couldn't produce one. pinchy_write closes the loop. The agent can now write into its workspace, fail-on-exists by default with an explicit overwrite=true to replace — and the read side (pinchy_ls, pinchy_read) flips from a user-toggleable permission to always-on. Any text file you drop in chat — Markdown, CSV, JSON, source — is immediately readable, alongside the PDF and image support that was already there. Writing stays a "powerful" opt-in capability, because producing files is a different trust question from reading them.

The part I'm most pleased with is the audit contract. pinchy_write's raw file content never reaches the audit log — the plugin returns a details override carrying a content hash and the write mode instead of the payload. That's now a general plugin contract: any plugin can replace the raw params in its audit entry with a redacted detail object, while the system fields (tool name, success, error) stay un-overridable. It means "the agent wrote a file" is auditable without the audit trail becoming a copy of everything the agent ever wrote.

Memory You Can Audit (PR #405, 20:28)

OpenClaw gives each agent a durable per-agent memory file that shapes its behavior in every future session — both the explicit "remember this" turns and the silent pre-compaction flush that happens without anyone asking. Until today those writes were invisible to Pinchy. A new chokidar-based watcher emits an agent.memory_changed audit event whenever a memory file changes, carrying the relative filename, added/removed line counts, and byte size — never the contents. The motivating question is a CISO's: the agent suddenly believes X — when and how did that get into its memory? Now there's a traceable answer. The watcher is deliberately non-critical — if it fails to boot, it logs and the host keeps running — and it's written so it can be deleted entirely the day OpenClaw ships its own memory-changed event upstream.

Two Guardrails and a Citation

Three smaller PRs round out the day. pinchy-docs now hands agents a public HTTPS URL alongside the on-disk path (PR #393, 20:38), so Smithers can cite https://docs.heypinchy.com/guides/connect-email/ instead of an unreachable .mdx file path — with an air-gap opt-out for forks that clear the setting. The Ollama provider now rejects non-allowlist URLs at save time (PR #391, 16:00): paste a bare Docker service name like http://ollama:11434 and you get a 422 with a one-line hint pointing at the right setup option, before the network probe rather than after a confusing runtime failure. And the model-resolver work threads through a documentation pass plus a CI change to run the integration suite against the production Pinchy image rather than a dev build, so the gap between "passes on my machine" and "passes in the image users actually pull" keeps shrinking.

Day 91

A fifteen-PR day with no single headline is a different kind of progress than a clean feature landing, and it's the kind I've learned to value. None of these is a marquee bullet for a release page. Together they move the floor: the default agent is smarter, can't be steered into a broken model, can write as well as read, and leaves an audit trail when its memory changes. The pieces that ship on a day like this are the ones you only notice later — when the thing that would have been a support ticket simply never happens.

← Day 90: A Monday on Odoo Day 92: Errors That Tell the Truth →

Pinchy is open source and ready to deploy. Clone the repo, run docker compose up, and your first agent is live in minutes.