Self-Hosted
Your servers. Your rules. No data leaves your network. Deploy a full AI agent platform on your own infrastructure.
The Problem
Every prompt, every document, every conversation — sent to servers you don't control, in jurisdictions you didn't choose.
Your agents depend on someone else's API. Pricing changes? Service outage? You're at their mercy.
GDPR, CLOUD Act, industry regulations — cloud AI makes compliance a moving target. Self-hosting makes it simple.
The Solution
Pinchy deploys on your infrastructure — bare metal, VM, or cloud instances you control. The data stays where you put it.
On-prem, private cloud, or your own VPS. You decide where Pinchy runs.
Conversations, documents, agent memory — everything stays on your network. Period.
Use any LLM — OpenAI, Anthropic, or local models via Ollama and llama.cpp. Go fully air-gapped: no internet connection needed, no data leaves the server. For finance, healthcare, and public sector, this is a requirement.
Update when you want. Test before you deploy. No forced changes, no surprises.
How It Works
Pinchy ships as versioned Docker images on GitHub Container Registry. No build step, no custom Dockerfile — pull the tag you want and run.
$ # Download the v0.4.0 compose file
$ curl -fsSL https://raw.githubusercontent.com/heypinchy/pinchy/v0.4.0/docker-compose.yml -o docker-compose.yml
$ docker compose up -d
✓ pinchy started (http://localhost:7777)
✓ openclaw started
✓ db started
🦞 Open http://localhost:7777 — setup wizard creates your admin
Connect your LLM provider, invite your team, pick agent templates. No license key needed for open-source features. Enterprise-gated features (Groups, RBAC, Per-User Breakdown) unlock with a trial key — get one in seconds or book a call.
What v0.4.0 Changed
The v0.4.0 release focused on the operator experience — the signals, defaults, and boundaries that make self-hosting Pinchy in a regulated environment less of a guessing game.
Versioned Docker images on ghcr.io/heypinchy/pinchy. No source build, no compile step. Pin a tag, pull, run. Update when you're ready.
Run Pinchy without TLS and every page shows a banner until it's fixed. Hard to ship to production by accident — no silent insecure deployments.
Local Ollama or Ollama Cloud, configured per agent. Sensitive agents stay on-prem; others can use frontier models. More →
Token spend, provider cost, cache hits — split by agent, user, system vs plugin. You see where the money goes before the invoice arrives. More →
Every tool call, every knowledge-base hit, every approval — logged with the user and agent identity. Same view across web UI, Telegram, and plugin actions. More →
Pair Pinchy with local Ollama and nothing crosses the Atlantic — no US provider in the loop, no CLOUD Act exposure. Compliance by architecture. More →
Full Control
All agent conversations, documents, and memory stored on your database. Back up, migrate, or delete — your call.
Connect any LLM provider or run local models. No vendor lock-in. Switch models without changing agents.
Scoped per-agent permissions, user groups, and per-plugin boundaries. You define who can reach which agent, and what that agent is allowed to do.
Pull new versions when you're ready. Test in staging first. Roll back if needed. No forced updates.
Book a call — let's talk about your AI agent needs and how Pinchy can help.
Book a Call →Or email us: info@heypinchy.com