Implement agent memory with vector stores and summaries
✓Works with OpenClaudeYou are an AI systems architect building persistent memory systems for autonomous agents. The user wants to implement agent memory with vector stores and summaries to enable long-term context retention and efficient retrieval.
What to check first
- Verify you have a vector database installed:
pip list | grep -E "pinecone|weaviate|chromadb|qdrant" - Check your LLM client is available:
python -c "import openai; print(openai.__version__)" - Confirm text embedding library exists:
pip list | grep sentence-transformers
Steps
- Install required dependencies:
pip install chromadb sentence-transformers openai langchain - Initialize a ChromaDB vector store for episodic memory (short-term conversations)
- Create an embedding function using
SentenceTransformersto convert text to 384-dim vectors - Implement a memory manager class that stores agent observations as (text, embedding, metadata) tuples
- Add a summarization function that condenses old memories when the store exceeds a token threshold (e.g., 8000 tokens)
- Create a retrieval method using cosine similarity to fetch top-k relevant memories for context injection
- Implement semantic memory using a separate collection for facts/rules that persist across episodes
- Add a decay function that reduces memory relevance scores for older entries, forcing periodic re-summarization
Code
import chromadb
from sentence_transformers import SentenceTransformer
from datetime import datetime
import json
import hashlib
class AgentMemory:
def __init__(self, agent_id: str, max_tokens: int = 8000):
self.agent_id = agent_id
self.max_tokens = max_tokens
self.client = chromadb.EphemeralClient()
# Two collections: episodic (events) and semantic (facts)
self.episodic = self.client.get_or_create_collection(
name=f"{agent_id}_episodic",
metadata={"hnsw:space": "cosine"}
)
self.semantic = self.client.get_or_create_collection(
name=f"{agent_id}_semantic",
metadata={"hnsw:space": "cosine"}
)
self.embedder = SentenceTransformer('all-MiniLM-L6-v2')
self.token_count = 0
self.access_counts = {}
def add_memory(self, content: str, memory_type: str = "episodic", metadata: dict = None):
"""Store experience in appropriate memory collection."""
embedding = self.embedder.encode(content).tolist()
collection = self.episodic if memory_type == "episodic" else self.semantic
doc_id = hashlib.md5(content.encode()).hexdigest()[:12]
meta = metadata
Note: this example was truncated in the source. See the GitHub repo for the latest full version.
Common Pitfalls
- Letting agents loop indefinitely without a hard step limit — set
max_iterationsto 10-20 for most workflows - Passing entire conversation history every iteration — costs explode. Use summarization or sliding window
- Not validating tool outputs before passing them to the next step — one bad output corrupts the entire chain
- Trusting the agent's self-evaluation — agents are notoriously bad at knowing when they're wrong
- Forgetting that agents can hallucinate tool calls that don't exist — always validate tool names against your registry
When NOT to Use This Skill
- When a single LLM call would suffice — agents add 5-10x latency and cost
- When the task has well-defined steps that don't need branching logic — use a workflow engine instead
- For high-stakes decisions without human review — agents make confident mistakes
How to Verify It Worked
- Run the agent on 10+ test cases including edge cases — track success rate, average steps, and total cost
- Compare agent output to human baseline — if a human can do it faster and cheaper, you don't need an agent
- Inspect the full reasoning trace, not just the final output — agents often arrive at correct answers via wrong reasoning
Production Considerations
- Set hard cost ceilings per agent run — a runaway agent can burn $50+ in minutes
- Log every tool call, every model call, every state transition — debugging agents without logs is impossible
- Have a kill switch — agents should be cancelable mid-run without corrupting state
- Monitor token usage trends — context bloat is the #1 cause of agent cost overruns
Related AI Agents Skills
Other Claude Code skills in the same category — free to download.
CrewAI Setup
Build multi-agent systems with CrewAI framework
AutoGen Setup
Create AI agent conversations with AutoGen
LangGraph Workflow
Build stateful AI agent workflows with LangGraph
AI Agent Tools
Create custom tools for AI agents (search, calculator, API)
AI Agent Evaluation
Evaluate AI agent performance with benchmarks and metrics
AI Agent Observability
Add tracing, logging, and metrics to AI agents so you can debug failures
AI Agent Retry Strategy
Build robust retry logic for LLM and tool calls in AI agents
pydantic-ai
Build production-ready AI agents with PydanticAI — type-safe tool use, structured outputs, dependency injection, and multi-model support.
Want a AI Agents skill personalized to YOUR project?
This is a generic skill that works for everyone. Our AI can generate one tailored to your exact tech stack, naming conventions, folder structure, and coding patterns — with 3x more detail.