The Problem claude-mem Solves
Here's the loop every Claude Code user knows too well:
- Start a new session.
- Spend 5 minutes re-explaining your project structure.
- Spend 5 more re-explaining your conventions, why you chose X over Y, which libraries are off-limits.
- Finally get to the actual task.
- Run out of session window 2 hours later.
- Tomorrow, repeat steps 2-3.
This is the single biggest daily annoyance of Claude Code at professional scale. You're either re-explaining context on every session (cost: time + tokens) or you're skipping the explanation and accepting dumber output (cost: more iteration).
claude-mem is the plugin that fixes it. 46,100 stars, 92 contributors, 3.5K forks. Created by Alex Newman (@thedotmack). Active development — v12.0.0 shipped recently.
What claude-mem Actually Does
It hooks into Claude Code's session lifecycle at five points:
- SessionStart — loads relevant prior context into Claude before your first message
- UserPromptSubmit — silently enriches your prompt with related memories
- PostToolUse — captures what Claude just did (file edits, commands run, observations made)
- Stop — summarizes the current session into structured memory
- SessionEnd — finalizes the memory record and updates the vector index
Everything is stored in a local SQLite database with Chroma vector search. No cloud service, no data leaves your machine, no per-month subscription.
The memory isn't raw text. It's semantically summarized via Claude's agent-sdk — meaning the tool uses Claude itself to compress what happened in your session into structured observations. That's what makes the ~10x token savings work: you're not re-pasting conversation history, you're getting a curated set of relevant observations.
The 3-Layer MCP Tool Workflow
When Claude needs context from prior sessions, it doesn't dump everything. It uses three MCP tools in sequence:
- search(query) — vector search across past sessions for relevant memories. Returns IDs + summaries.
- timeline(time_range) — chronological view of what you worked on in a period. Useful for "pick up where I left off."
- get_observations(ids) — fetches the full detail for specific memory IDs.
This progressive-disclosure design is where the 10x savings come from. Most queries only need search output (summaries). Only when Claude genuinely needs details does it pull full observations. Cheaper than loading entire chat history into context.
Installation (One Command)
npx claude-mem install
That's it. The installer writes the plugin config, sets up the local SQLite database, wires the lifecycle hooks, and registers the MCP server. Restart Claude Code and memory is live.
Works on Claude Code, Cursor, Gemini CLI, Windsurf, and OpenClaw — all the major AI coding tools. Cross-platform: Linux, macOS, Windows.
A Real Example of What It Catches
Session 1 (Monday):
"Help me add retry logic to the payment processor. We use an exponential backoff pattern here, not linear. Also — we never retry network-level failures, only 5xx responses."
Claude implements the retry, following your rules.
Session 2 (Thursday, new chat):
"Add retry logic to the webhooks endpoint."
Without claude-mem: Claude writes linear retry with retries on network errors. You catch it in review, re-explain your conventions, iterate.
With claude-mem: Claude's first draft uses exponential backoff and only retries on 5xx. Why? Because SessionStart loaded the observation "Project convention: exponential backoff, retry only on 5xx responses (from payment processor task, Monday)." You didn't re-explain anything — the plugin did it for you.
That's the multiplier. Not just "persistent memory" in the abstract sense — your specific conventions making it into your specific prompts automatically.
Who Should Install It
Good fit:
- Anyone working on a codebase for more than a week. The longer the project, the more valuable the memory.
- Teams with specific conventions that need to stay consistent
- People burning through Claude Code session windows faster than they'd like
- Anyone whose "context re-explanation" adds up to more than 10 minutes a day
Bad fit:
- One-off scripts or throwaway prototypes — not enough repeat sessions to build useful memory
- Highly regulated environments where you can't store any session data locally (rare but real)
- Very short contexts where re-explaining is genuinely faster than managing a memory system
What It Doesn't Do
claude-mem stores what happened (tool calls, file edits, observations Claude made). It doesn't magically understand why you made choices unless Claude noted them during the session. If you silently override a Claude suggestion without explaining, that override doesn't become memory.
To get the most value: when you correct Claude, tell Claude why. "Don't use axios here — we standardized on native fetch across the codebase." That comment becomes a persistent memory. Your future sessions won't make the same mistake.
How It Compares to Alternatives
- Vanilla Claude Code
CLAUDE.md— manually curated project context. Works for 5-10 rules. Breaks past that. - Cursor's memory feature — IDE-specific, cloud-stored, less configurable. claude-mem is local and more flexible.
- memsearch ccplugin — lighter-weight alternative. Good if you want vector search without the lifecycle hooks.
- Writing your own skill files — works for stable knowledge. claude-mem is for evolving project memory.
Closest competitor is gstack, which takes a different approach — instead of capturing memory, it enforces role discipline so each session is more self-contained. You can run both at once; they're orthogonal.
A Few Gotchas
-
The first few days are noisy. Memory quality improves with session volume. Your first 5 sessions produce mediocre memories. By session 30, they're genuinely useful.
-
Storage grows. SQLite DB gets bigger over time. Not a problem for the first year. Budget a GB or so if you're heavy-use on a large project.
-
Context shift can surprise you. Because Claude is loading memory before your prompt, you might wonder where a decision came from. Check the loaded memories via the search MCP tool if Claude does something unexpected.
-
Works best in a dedicated project. If you use Claude Code across many small side projects, the memories cross-contaminate. Either install per-project or accept that memory will be noisier.
What to Pair It With
Once you have persistent memory working, the next multipliers are:
- Graphify — indexes your codebase structure so Claude navigates by architecture rather than file-by-file grep
- Superpowers or gstack — methodology layer, so the memory captures disciplined behavior not ad-hoc choices
- Claude skills library (2,300+ free skills) — domain-specific knowledge that complements project memory
FAQ
Does claude-mem send my code to a third party?
No. Everything is local — SQLite + Chroma, on your machine. The only API calls are to Claude (which you were already doing) via Claude's agent-sdk for memory compression.
How much does claude-mem cost?
The plugin is free and open source. There's a small Claude API cost for the compression step — usually pennies per session.
Does it work with the Claude 200K context window?
Yes. The plugin is designed to reduce context usage, so a bigger window just means more room for actual work after memory injection.
Can I clear memory for a specific project?
Yes. The SQLite DB lives at a known path — delete it to reset, or use the bundled CLI to delete specific memory namespaces.
Does it support Haiku / Sonnet / Opus equally?
Yes. The compression step uses whatever Claude model you have configured. Smaller models produce slightly less refined summaries but the mechanism works identically.
Is it on the Claude Certified Architect exam?
Not directly — plugins aren't individually tested. But memory management (Domain 5, "Context Management & Reliability," 15% of the CCA exam) is. Understanding how claude-mem compresses + retrieves context is a concrete example of the kind of pattern the exam tests in the abstract.
The Short Version
claude-mem is the closest thing Claude Code currently has to a real working-memory layer. If you're using Claude Code for more than one session on the same project, the 30-second install pays for itself within a day. The 10x token savings are real, the memory quality genuinely compounds, and the cost (free plus trivial API usage) makes the install a no-brainer.
Install it. Live with it for a week. You won't go back.