Day 65: Secrets Out of the Config
Thursday. The biggest day on main since v0.4.0. Three pieces of work land together: the secrets migration, the chat delivery-status and retry flow, and a safer integration-delete flow with a usage preflight. I'll take them in that order, because the first one is the one that changes the shape of the product.
openclaw.json Was a Credential Store
The honest state of things before today: openclaw.json — the config file OpenClaw reads on startup — contained plaintext secrets. The gateway auth token. The Brave API key. The Ollama Cloud API key. The Telegram bot token for any channel integration. The Odoo connection's API key. An admin who could read the config file could read every secret the running instance held. Not great.
It also wasn't a thing anyone had to have done wrong deliberately — it was the default OpenClaw shape, carried forward from when secrets meant one thing and the config meant another, and the distinction hadn't been forced yet. Today it got forced.
SecretRef
The new arrangement looks like this. The container gets a second tmpfs volume — openclaw-secrets — mounted alongside the existing openclaw-config volume. Tmpfs means the file only exists in RAM; nothing lands on the host disk, and a restart clears it before it gets rebuilt by Pinchy on the next config write. Inside that volume lives secrets.json, mode 0600, umask-safe, written atomically.
Everywhere openclaw.json used to contain a secret, it now contains a SecretRef marker — an opaque key like secret:gateway.auth.token that points at the corresponding entry in secrets.json. OpenClaw reads the config, walks the tree, resolves each SecretRef at startup against the secrets file. If the secrets file is missing a key the config references, the plugin doesn't start — failing closed rather than falling back to an empty string that silently breaks every request.
Every secret migrated in one push: gateway.auth.token, ollama-cloud.apiKey, the pinchy-*.gatewayToken used by each Pinchy plugin, pinchy-odoo's per-connection apiKey, pinchy-web's braveApiKey, the Telegram botToken across full-regenerate and targeted-update paths, and every env.* that passed through the config. Fifteen-ish commits, each narrow enough to read.
The Scanner, Because Trust Is Not a Migration Strategy
Getting the initial migration right isn't the hard part. The hard part is making sure nobody accidentally adds a plaintext secret to openclaw.json six months from now — in a new plugin, in a test fixture, in a debug branch that sneaks into a release. So the config writer now has a plaintext-secret scanner sitting in front of it: a set of heuristics (known secret field names, API-key-shaped strings, entries that look like JWTs or raw tokens) that runs on every write. Anything that triggers the scanner blocks the write and raises a loud error. Defence-in-depth, not defence-in-intention.
A one-off audit-log plaintext scan ran as part of the migration — past writes that might have landed plaintext secrets in audit rows, now known and cleaned up. The migration also deletes openclaw.json.bak if it exists, because a backup file containing the old plaintext values sitting on disk next to the shiny new config would defeat most of the point.
Delivery Status and Retry
In parallel — and small, by comparison — the chat UI grew proper message lifecycle states. A user message now transitions through sending → sent → failed instead of sitting in the undifferentiated "it's in the box" state the old UI showed. The transitions are driven by a reducer keyed on clientMessageId threaded from the client through the bridge to OpenClaw and back. The server emits a userMessagePersisted acknowledgement after it writes the session; the client flips the row to sent. If that ack doesn't arrive within ten seconds, the row flips to failed.
Failed messages get a Retry button. Partial-stream failures — assistant replies cut off mid-stream because the model backend died — get a different retry path that continues the turn rather than restarting it. A synthetic "orphan" error bubble appears when an assistant row looks stuck mid-stream on history reload, with the same retry treatment. Audit log gets a chat.retry_triggered event for both flavours, so the trail shows when a user manually recovered from a transient failure.
Integration Delete That Asks Before It Cascades
The third piece: deleting an integration is now a two-step interaction with a real preflight. The old Delete button issued a DELETE and hoped. If the integration was used by live agents, the cascade quietly unhooked their tool permissions, and the admin found out later by noticing an agent had lost its ability to do its job.
The new flow: the dialog opens, calls a usage endpoint, and reports "X is used by N agents". If N is zero, one click finishes it. If N is non-zero, the admin sees exactly which agents will be affected and the delete requires an explicit Detach and delete confirmation — which runs detachment and deletion in a single transaction, both operations or neither. The integrations list now shows the per-row usage count up front, so the preflight rarely surprises anyone.
The FK on the permissions table got tightened too. Previously a connection-id FK could cascade silently — ON DELETE CASCADE was doing more work than the UI implied. Now the FK is restricted; the permission rows have to be detached explicitly by the delete endpoint, which is the same thing the UI is asking the admin to confirm. The database and the UI tell the same story.
Day 65
The secrets migration is the kind of change that belongs in a release only as a footnote — "secrets are now stored in a tmpfs volume rather than the main config file" — and not in marketing copy at all. The admins who needed it didn't know they needed it. The admins who did know needed it to have already happened. Today it happened. v0.4.5's upgrade notes are drafted; the release itself is a day or two away.