Day 48: Qwen by Default
Some days you ship features. Some days you ship a one-line fix that makes everything else work better. Today was the second kind.
The Default Model Problem
When you set up Pinchy with a local Ollama provider, it auto-discovers your installed models and picks a default. The original heuristic was simple: pick the largest model that supports tool calling. Bigger is better, right?
Turns out, no. Bigger is not better when it comes to tool calling reliability. Some popular large models are trained primarily for chat — they technically support tools, but they call them inconsistently, miss arguments, or hallucinate function signatures. Smaller models specifically trained for tool use (Qwen is the standout right now) often outperform much larger general-purpose models for agent workflows.
The fix: prefer Qwen models in the default selection logic, fall back to the largest tool-capable model only if no Qwen model is installed. One small change, dramatically more reliable agent behavior out of the box.
Cache Skip for Local Ollama
Another small but important fix: skip the model list cache entirely for local Ollama installations. With cloud providers, caching the model list makes sense — those lists change rarely. With local Ollama, you're constantly pulling new models and trying them out. Cached lists were causing "model not found" errors when users installed a new model and Pinchy was still showing the old list.
Now: every request fetches the live list from your local Ollama instance. Slightly more bandwidth (but it's localhost), instantly accurate.
The WebSocket Mock Fix
Found a strange test failure: the WebSocket mock I'd built for E2E tests was intercepting Next.js HMR (hot module reload) sockets, breaking dev mode. Spent some time debugging because the symptom was "the dev server suddenly stopped reloading" with no clear connection to the test infrastructure.
The fix: pass through any WebSocket connection that's not specifically targeting Pinchy's API. Clean separation between test mocks and dev tooling. The kind of bug that's frustrating to find and satisfying to fix.
Day 48
A one-line fix that makes Ollama work better. A cache flag that prevents user confusion. A WebSocket pass-through that unblocks dev mode. Nothing flashy, all important. The week before a release is always like this — small fixes accumulating into something solid.