Day 75: The Bonjour Watchdog and Other Ghosts
Sunday. The OpenClaw runtime bump that didn't happen on Wednesday lands today, in two careful hops, with the auth-profiles writer it was waiting for. Along the way, a handful of small fixes that wouldn't have been findable without running the new version against real container infrastructure. The most charming of them involves a printer-discovery protocol that is, somehow, still able to take a containerised Linux process down.
The Bonjour Watchdog
The bug. OpenClaw, by inheritance from one of its libraries, runs an mDNS announcer: a small thread that broadcasts the node's presence on the local network so other instances can discover each other. This is useful on a laptop and benign on a typical Linux host. In a Docker container with no multicast permissions and no /etc/avahi socket, the announcer can't open its broadcast socket, the announcer's watchdog notices the announcer hasn't heartbeated, and the watchdog — which is configured to be aggressive on the assumption that a wedged announcer is worse than a missing one — sends SIGTERM to the parent process. Which is OpenClaw. Which, from Pinchy's perspective, looks like the gateway randomly dying every 30-60 seconds for no good reason.
The fix is one line: disable the mDNS announcer in container builds. The same commit also disables the OpenClaw update-checker (which would try to phone home from inside our container with an outbound HTTP that customer firewalls are likely to block, generating a worrying log line for no benefit), the controlUi embedded admin panel (which exposes a port we don't expose), and the canvasHost rendering server (only meaningful for desktop deployments). All four are correct in the desktop deployment OpenClaw was originally written for; all four are nuisances in the container deployment Pinchy is.
The matching fix on the Pinchy-config side is to preserve these disablements through config writes. Pinchy regenerates openclaw.json on every config change, and the previous code paths were either dropping the disable flags (so OpenClaw would pick the defaults back up on reload) or stripping the OpenClaw-managed enrichments under discovery, update, and canvasHost on the next merge. The fix preserves them through every write, so the disablements stick across config regeneration the way the auth-profiles do.
Per-Agent auth-profiles.json
The reason the OpenClaw bump was held up on Wednesday: 4.27 changed the way credentials are read by individual agents. Instead of a global auth-profiles block in openclaw.json, every agent now gets its own auth-profiles.json in its agent directory, scoped to the model provider that agent uses. The motivation on the OpenClaw side is good — credentials shouldn't be more available than they need to be — but it requires Pinchy to actually write these files, with the right scoping, on every config regeneration.
The auth-profiles writer landed today across several commits. The writer itself first, with a unit test against a fake agent definition. Then the integration into the config regenerator, so that an agent created in the UI gets its auth-profiles.json written before the config push triggers an OpenClaw reload. Then the scoping fix: each auth-profiles.json contains only the provider the agent uses, not the full list of providers configured in the workspace, so an agent on Anthropic doesn't get a copy of the OpenAI key just because the workspace also has OpenAI configured. Then the env redaction fix to make sure secrets values don't leak into log lines while the writer is debugging.
And the file-permissions choreography from earlier in the week extends to the new files: auth-profiles.json gets chmod 0600 root-owned in the same fast-tick loop, with a separate fix to keep the agents/ directory writable by Pinchy (uid 999) while the per-file permissions are tighter. Atomic writes go through a tmp file with a documented cleanup step, with a test that fails if the tmp dir leaks.
Two Bumps to Get to 4.27
The bump itself happened in two hops. First a tentative 4.14 → 4.15 bump for a Cliff-1 CI repro that we needed for an unrelated investigation, immediately reverted because 4.15 hadn't been validated for production. Then a clean 4.12 → 4.14 intermediate bump (catching up to a version we'd already validated separately), with the 4.14 → 4.27 push following once the auth-profiles writer was wired up. The upgrade guide picked up the matching note about the runtime bump under v0.5.0's upgrade notes — the kind of thing where the failure mode is an admin upgrading without seeing the note, hitting a config that doesn't write its auth-profiles, and watching every agent fail to authenticate.
Several small follow-ups in the same session. The usage-poller's first poll now defers 60 seconds rather than firing immediately on boot, because OC 4.27's startup scan is heavier than 4.14's and the immediate poll was racing the scan to access the same file. Telegram channel fields on the OC side gained a few new attributes that we now have to preserve across writes, since dropping them would silently break channel pairing. The integration that resolves Ollama URLs now uses the Docker gateway IP rather than host.docker.internal — the latter resolves correctly on Docker Desktop but not on a Linux host running plain Docker, which is, of course, the deployment shape that matters.
Idempotency, Contracted
One e2e test of the day that's worth flagging. cold-start cascade + regenerate idempotency contract creates an agent during cold-start, waits for the gateway to converge, regenerates the same config five times in a row, and asserts that the resulting openclaw.json on disk is byte-for-byte identical to the first one — modulo a known set of meta fields that are allowed to drift (lastTouchedAt and the surrounding meta block). The contract is what yesterday's #193 cascade-fix needed: a regenerate that writes the same config it already had should not trigger any reload, ever, including on the very first regenerate after cold-start.
Two small follow-ups on that test. The first attempt was ignoring only lastTouchedAt in the diff, which left other meta drift visible and causing flakes. The second pass strips the entire meta block before comparing, since the meta is OpenClaw-managed and Pinchy's config regenerator should be free to leave it alone entirely. Two diffs went away as a result and the test is now stable across 100 consecutive runs.
A handful of supplemental fixes in the openclaw-config writer that fall out of the same investigation: config.apply now supplements its payload with auto-configured fields rather than relying on OpenClaw to reconstruct them; the writer falls back to file meta when OC's in-memory config lacks meta (the boot window where in-memory state hasn't caught up); readExistingConfig retries after 300ms when the file is empty, because the reader was occasionally racing the writer in the very early boot path; a dynamic import in pushConfigInBackground got promoted to a static import to make the call site easier to trace.
And one seed-data fix: when Pinchy runs its first-time setup, the Smithers helper agent now gets the workspace's default_provider model rather than a hardcoded fallback that may not exist in the user's curated list. The hardcoded fallback was working for everyone because everyone's list happened to contain it; would have stopped working for the first user whose curated list didn't.
The CLAUDE.md Re-Read
Two docs commits at the end of the session, the kind that are easy to skip and easy to regret. CLAUDE.md — the file at the root of the repo that tells the AI assistant working in the codebase what the codebase actually is — had drifted on a few specifics: the audit-log scope description was missing a couple of categories, the plugin list mentioned an old name for one of the integrations, and the project-structure tree had stale comments next to several directories. The updates aren't large but they're the kind of thing where the cost of stale documentation compounds — every interaction with the assistant gets slightly wrong context, and the wrongness eventually surfaces as a wrong fix landed on the wrong file. The audit-exempt escape hatch — the small mechanism that lets internal callers skip audit emission for genuinely internal calls — got an explicit note explaining when it should and shouldn't be used.
Day 75
The week landed on the version of v0.5.0 that's beginning to look ready: secrets out of the config, cascade-free agent creation, audit export with integrity hashes, the security floor raised, and now the OpenClaw runtime current with the per-agent auth-profiles model. The Bonjour-watchdog fix is the kind of thing that's small in the diff and large in the headline: a customer running Pinchy in a hardened container would have hit it within hours, and the trail to the root cause goes through a daemon nobody thinks about until they have to. The kind of weirdness that surfaces specifically in environments the team didn't set up themselves — and the kind of fix that has to land before the release that goes into them.