First month with OpenClaw

March 24, 2026

15 min read

aiopenclawdeveloper-toolingself-hostedollama

I asked Claw what hybrid search actually does, because I realized I'd been using it for a week without understanding it. It walked me through the mechanics, and then I said "if we have duplicates or compacting is needed, do it now." It read through all 58 daily memory files, cross-referenced them against long-term memory, archived 40 that were already captured, and reported back. The whole thing took a few minutes. I watched the noise floor drop in real time while eating dinner.

That's not what a chatbot does. That's a workflow I'd have procrastinated on for weeks.

From two days to thirty

A month ago I wrote about setting up OpenClaw for the first time. Two days in, I had a Telegram-connected assistant that could run shell commands, open GitHub PRs, and send me a morning brief. It worked. It also forgot everything between sessions, searched memory by keywords only, and needed me to start every interaction.

The setup I'm running now is different enough that it deserves its own writeup. Some of what I built in the first week turned out to be unnecessary. Other things I hadn't thought about became essential within days.

Somewhere in that month, the thing quietly stopped being a tool I use and started being infrastructure I rely on. Whether that's good or just comfortable, I'm not sure yet.

Teaching it to remember

The biggest upgrade was giving Claw semantic memory. The default memory search is keyword-based, and it breaks down fast. If you ask "what were those OIDC providers we looked at for locking down the MCP server?" keyword search needs you to have used exactly those words. You don't always remember what words you used three weeks ago.

Ollama runs locally on the same VPS. One install, one systemd service, no GPU required. I'm running bge-m3, a multilingual embedding model that handles English and German, which matters because I switch between the two constantly.

Setup is minimal. Ollama serves an OpenAI-compatible API on localhost:11434. OpenClaw's memory plugin connects to it, converts text chunks into embeddings, and stores them in SQLite. Queries get embedded the same way and matched by vector similarity. In practice it's hybrid: 70% vector similarity, 30% BM25 keyword matching. That way it catches exact names and version numbers that semantic search alone would miss, and also finds concepts expressed in different words than you originally used. The index lives in SQLite and only re-embeds chunks that changed. First session after setup was slow while it indexed everything. Every session after: instant.

It runs on CPU. Queries take a few hundred milliseconds. For a personal assistant searching through a few hundred kilobytes of notes, that's plenty.

A hand pulling a card from a wooden library card catalog drawer, with rows of indexed cards — Same idea, different century.Daniel ForsmanUnsplash License

OpenClaw's memory itself is just markdown files in a directory. No proprietary format, no database you can't read. I settled on two layers: daily files in memory/ capture what happened, and MEMORY.md is the curated layer with distilled knowledge and long-term context.

Daily files accumulate. After a month I had 58 of them, and the search results were getting noisy — old transcripts from weeks ago surfacing alongside current context. So I ran a compaction: Claw read through everything, identified what was already captured in long-term memory, archived the redundant files. 58 down to 18. This matters because every file gets indexed for search. Compaction isn't tidiness, it's signal-to-noise maintenance.

The separation between daily logs and curated memory took me a while to appreciate. Early on I kept everything in MEMORY.md. It grew too fast, included too much transient context, and became hard to skim. Splitting them is the same distinction between "things I jotted down today" and "things I actually know."

The part where it stops waiting

The cron system changed how I use this more than anything else.

OpenClaw can run scheduled jobs independently of the main conversation. Each job gets its own isolated session with separate context and model. The main session doesn't get cluttered with background work, and background work doesn't need me to be active.

The morning brief runs at 06:30 on weekdays, 08:30 on weekends. Checks my task manager for what's due and what's overdue. That weekend delay was an early decision that turned out to matter more than expected. Getting a 06:30 ping on a Saturday is a different experience.

A healthcheck runs every hour, auditing the server: open ports, SSH configuration, firewall state, package updates. Findings get fingerprinted so the same known issue doesn't alert repeatedly. I can acknowledge a finding for 24 hours or 7 days, and it stays quiet until something changes or the ack expires. Most days, nothing. When something does surface, I actually pay attention because I know it's new.

Backups push the entire workspace to a private GitHub repo every night. I got the first version wrong. It pushed, reported success, and I had no way to know whether the push actually contained what I expected. A silent failure, a permissions change, a new gitignore rule, and the backup succeeds while missing data. Version two verifies: the script checks the remote HEAD against the local commit, counts tracked files against a threshold, and confirms the diff landed. If anything looks off, it says so instead of "backup complete." The workspace is the source of truth for everything Claw knows. Memory files, scripts, skill configs. Losing it means rebuilding from scratch.

Then there's weekly housekeeping on Sundays at 04:00, cleaning up stale audio files, npm debug logs, Python caches, temp files. It reports disk usage and workspace state so I'm not surprised by a full disk in three months. An RSS monitor forwards new articles from a set of feeds. A self-update checker compares installed versions against npm's latest and asks before proceeding. Updates don't happen without me saying yes.

Macro close-up of interlocking brass clockwork gears, warm tones, shallow depth of field — Lots of moving parts. Mostly quiet.Laura OckelUnsplash License

None of this is revolutionary on its own. Together, though, they add up. I check my phone, see a brief or an alert, respond or don't. Most days there's nothing to respond to, which is the point.

When the watcher needs watching

One lesson I learned the hard way: if your agent framework crashes, it can't restart itself.

Claw caused this one. It tried to disable the heartbeat system by writing a config key that doesn't exist in the schema: agents.defaults.heartbeat.enabled. The gateway accepted the write, restarted, hit the invalid key, and crashed. Then it tried to start again. Same key. Crash. For seven hours.

Nothing was monitoring it because the monitoring was running inside the thing that was broken. I found out that evening — not from a missing morning brief, just by noticing the silence. No alerts, no brief, no cron results. Radio silence is its own kind of alert when you've gotten used to the noise.

The fix was a systemd timer that runs every 15 minutes, completely independent of OpenClaw. It checks whether the gateway is responsive, restarts the service if not, and sends a notification through a separate channel. The watchdog can't be brought down by the thing it's watching.

There's something worth sitting with here: the AI assistant bricked itself. It wrote a config value that looked reasonable, the system didn't validate it on write, and the crash loop ran for hours. It's the kind of failure that makes you think about what "trust but verify" actually means in practice.

Lighthouse beam sweeping across a starry night sky, stone tower on a narrow walkway over dark ocean — Something that watches when you're not watching.Evgeni TcherkasskiUnsplash License

Working from anywhere

I use Todoist for task management. The integration is a Python wrapper around the REST API: check what's due, add tasks, mark things done, search, move tasks between projects.

What makes it useful is that I can do all of this from Telegram. Walking somewhere, idea hits, voice message. It gets transcribed, parsed, added. The task exists before I've forgotten about it.

Voice transcription was already running when I wrote the first article, parakeet-tdt via ONNX, local, no cloud. What changed is that it now triggers automatically when an audio message arrives and cleans up the file after. The model handles German and English without being told which one is coming. I switch between them mid-conversation and it follows. A 30-second voice note takes a few seconds on CPU. Fast enough that I don't think about it.

The morning brief pulls from the same Todoist source. What's due today, what's overdue, all in one message before I've opened my laptop.

A month later

The first two days were about setup and novelty. A month in, I don't think about most of this anymore. The morning brief arrives, I glance at it. A healthcheck finds something, I deal with it or acknowledge it. Backups happen. When I need something, I ask: add a task, open a PR, look up something we discussed weeks ago that I've already forgotten.

I don't really open it anymore. It just runs alongside my work. The cron jobs were the tipping point. Once the server started doing useful things without being asked, I couldn't quite tell if I'd delegated responsibility or just stopped paying attention to the things it handles. Probably both.

There are rough edges. The memory search still surfaces irrelevant stuff sometimes, usually old daily logs that should have been compacted sooner. The healthcheck occasionally flags something I've already dealt with because I forgot to acknowledge it properly. Telegram streaming broke for a day when an OpenClaw update introduced duplicate messages — every response arrived twice until I disabled streaming, waited for the fix, and re-enabled it. These are the kinds of things that remind you it's software, not magic.

One thing I didn't anticipate: the cron jobs burn API tokens even when they have nothing to say. An hourly healthcheck that concludes "all findings acked, nothing new" still loads the full system prompt and does a complete model turn. At scale that adds up fast.

The fix came from a simple realization: cron jobs don't need to be fast. A morning brief that takes 30 seconds instead of 5 doesn't affect anything — it runs in the background while I'm still asleep. So I split by what actually matters. Live interaction stays on Opus 4.6; I want snappy responses there. Background tasks — morning briefs, housekeeping, backups, RSS forwarding — moved to a local model. qwen3:8b runs on the same VPS via Ollama, CPU-only, no GPU required. Slower, but for something firing at 06:30 while I'm making coffee, that's irrelevant. The jobs that need real reasoning stay on Opus 4.6 via Anthropic's API. If Anthropic ever throttles or goes down, qwen3:8b picks up automatically. Degraded, not dead.

What's shifted more than I expected is the nature of the interaction itself. Early on it felt like configuring a system — prompts, cron expressions, file structures. Now it feels more like working with someone who already knows the context. I send a voice note mid-thought, half in German, half in English, and something useful comes back. I describe a problem and we think through it together. I push back on a draft, it revises. We argue occasionally about the right approach. That dynamic — the back-and-forth, the shared memory of what we've already tried — is what makes it different from any chatbot I've used before.

A few weeks in I found myself wanting to build something new. I described the idea, we went through the design, I watched the scaffold take shape in a single session. It wasn't magic — I still had to review everything, make decisions, catch mistakes. But the gap between "I have an idea" and "something exists" got a lot smaller.

The castle from the first article is still standing. It just runs itself now, mostly. Whether that means I built something good or just got comfortable with not checking is the question I keep not answering.