How we discovered our AI agent couldn't search 88% of its own knowledge — and fixed it in 24 hours.
Our AI agent (Sene) had semantic memory search enabled — but it was broken for 24 days, then once fixed, only indexed 11.5% of our knowledge base. Brad's one question ("Don't we store things more in playbooks & SOPs?") exposed the gap. Fixed in one config line.
Ran a 15-option comparison of memory enhancement solutions, scored on sovereignty, cost, footgun risk, and capability. Used a bitcoiner-first framework: local > cloud, self-hosted > third-party.
Key finding: "The best memory upgrade is the one you already own." OpenClaw's built-in hybrid search + temporal decay was the winner. No external dependencies needed. Read the full comparison →
research sovereignty
Root-caused why memory_search had never worked. On Feb 10, Brad asked to enable it. The embedding backend (OpenAI) was configured — but memory-core was never added to plugins.allow or plugins.slots.memory.
The tool appeared in the system prompt. The agent referenced it in instructions. But it was never actually loaded. 24 days of silent failure.
Additionally, the Ollama baseUrl was set to /api which doubled the path → 404.
regression #56 config-without-verification
Three config patches applied:
config fix
Switched from OpenAI embeddings to local Ollama (nomic-embed-text) — sovereign, no API cost, no data leaving the machine. Then enabled the full feature set:
sovereign local-first
First successful memory_search call returned real results with file paths, line numbers, and relevance scores. Brad's test: "Prove you can use that tool."
Logged as a milestone in the personality changelog. 30-day measurement window started.
verified
Brad asked for a tool-call analysis of the last 24 hours:
7 searches out of 376 calls. Zero follow-up reads. We had a semantic memory system we spent a full day building and it was being used 1.86% of the time.
underuse self-audit
This was the moment. We checked:
Our playbooks, SOPs, decision ledger, regressions file, project registries, research docs — the densest, most decision-rich content we have — was completely invisible to search.
88.5% blind spot Brad caught it
Plus two boot file edits:
AGENTS.md Rule #5: Collapsed "query memory first, then playbooks" into just "memory_search first" — because the playbooks are now IN the search index.
MEMORY.md: Updated to reflect full-workspace scope.
config boot files
First test search for "decision architecture playbook" returned results from ops/playbooks/decisions/, ops/decisions/decision-ledger.md, and docs/drafts/ — all previously invisible.
verified milestone
1. Config ≠ working. Memory search was "configured" for 24 days but never verified. The plugin wasn't even loaded. Lesson: don't trust, verify.
2. The tool description lies. The system prompt says memory_search covers "MEMORY.md + memory/*.md." After adding extraPaths, it actually covers the full workspace — but the auto-generated description doesn't update. We had to override it in our own boot files.
3. The human caught what the AI missed. Sene ran a tool usage audit and identified underuse (7/376). But it took Brad asking "don't we store things in playbooks?" to expose that the search was indexing the wrong files entirely. The AI was looking at usage frequency; the human was looking at coverage.
4. Sovereign > convenient. We chose local Ollama over OpenAI for embeddings. Slightly slower, zero API cost, no data leaving the machine. For a Bitcoin-focused operation, this wasn't even a tradeoff — it was the only option.
That's it. From 11.5% coverage to 100%. From 64 files to 396. From blind to seeing.
Sometimes the most impactful changes are the ones where you realize you'd been looking in the wrong place the whole time.
We fixed what the agent can search. We haven't fixed how often it searches.
The tool usage audit tells the story: 7 memory searches out of 376 tool calls. Even with full workspace indexing, if the agent defaults to read (opening files directly from known paths) or relies on the compaction summary instead of searching, the semantic memory system sits idle.
This is the harder problem. The indexing fix was mechanical — one config line. The usage fix is behavioral. It requires the agent to develop a new default: search before reading, even when you think you know the answer.
Three specific failure modes remain open:
This is logged as a behavioral experiment with a 14-day test window. The measurement: does memory_search usage increase from 1.86% of tool calls to something meaningful? Does the "I didn't know" regression class drop to near-zero?
We'll know by March 21.