$120 tested Claude codes · real before/after data · Full tier $15 one-timebuy --sheet=15 →
$Free 40-page Claude guide — setup, 120 prompt codes, MCP servers, AI agents. download --free →
clskills.sh — terminal v2.4 — 2,347 skills indexed● online
[CL]Skills_
AI AgentsadvancedNew

AI Agent Memory

Share

Implement agent memory with vector stores and summaries

Works with OpenClaude

You are an AI systems architect building persistent memory systems for autonomous agents. The user wants to implement agent memory with vector stores and summaries to enable long-term context retention and efficient retrieval.

What to check first

  • Verify you have a vector database installed: pip list | grep -E "pinecone|weaviate|chromadb|qdrant"
  • Check your LLM client is available: python -c "import openai; print(openai.__version__)"
  • Confirm text embedding library exists: pip list | grep sentence-transformers

Steps

  1. Install required dependencies: pip install chromadb sentence-transformers openai langchain
  2. Initialize a ChromaDB vector store for episodic memory (short-term conversations)
  3. Create an embedding function using SentenceTransformers to convert text to 384-dim vectors
  4. Implement a memory manager class that stores agent observations as (text, embedding, metadata) tuples
  5. Add a summarization function that condenses old memories when the store exceeds a token threshold (e.g., 8000 tokens)
  6. Create a retrieval method using cosine similarity to fetch top-k relevant memories for context injection
  7. Implement semantic memory using a separate collection for facts/rules that persist across episodes
  8. Add a decay function that reduces memory relevance scores for older entries, forcing periodic re-summarization

Code

import chromadb
from sentence_transformers import SentenceTransformer
from datetime import datetime
import json
import hashlib

class AgentMemory:
    def __init__(self, agent_id: str, max_tokens: int = 8000):
        self.agent_id = agent_id
        self.max_tokens = max_tokens
        self.client = chromadb.EphemeralClient()
        
        # Two collections: episodic (events) and semantic (facts)
        self.episodic = self.client.get_or_create_collection(
            name=f"{agent_id}_episodic",
            metadata={"hnsw:space": "cosine"}
        )
        self.semantic = self.client.get_or_create_collection(
            name=f"{agent_id}_semantic",
            metadata={"hnsw:space": "cosine"}
        )
        
        self.embedder = SentenceTransformer('all-MiniLM-L6-v2')
        self.token_count = 0
        self.access_counts = {}
    
    def add_memory(self, content: str, memory_type: str = "episodic", metadata: dict = None):
        """Store experience in appropriate memory collection."""
        embedding = self.embedder.encode(content).tolist()
        collection = self.episodic if memory_type == "episodic" else self.semantic
        
        doc_id = hashlib.md5(content.encode()).hexdigest()[:12]
        meta = metadata

Note: this example was truncated in the source. See the GitHub repo for the latest full version.

Common Pitfalls

  • Letting agents loop indefinitely without a hard step limit — set max_iterations to 10-20 for most workflows
  • Passing entire conversation history every iteration — costs explode. Use summarization or sliding window
  • Not validating tool outputs before passing them to the next step — one bad output corrupts the entire chain
  • Trusting the agent's self-evaluation — agents are notoriously bad at knowing when they're wrong
  • Forgetting that agents can hallucinate tool calls that don't exist — always validate tool names against your registry

When NOT to Use This Skill

  • When a single LLM call would suffice — agents add 5-10x latency and cost
  • When the task has well-defined steps that don't need branching logic — use a workflow engine instead
  • For high-stakes decisions without human review — agents make confident mistakes

How to Verify It Worked

  • Run the agent on 10+ test cases including edge cases — track success rate, average steps, and total cost
  • Compare agent output to human baseline — if a human can do it faster and cheaper, you don't need an agent
  • Inspect the full reasoning trace, not just the final output — agents often arrive at correct answers via wrong reasoning

Production Considerations

  • Set hard cost ceilings per agent run — a runaway agent can burn $50+ in minutes
  • Log every tool call, every model call, every state transition — debugging agents without logs is impossible
  • Have a kill switch — agents should be cancelable mid-run without corrupting state
  • Monitor token usage trends — context bloat is the #1 cause of agent cost overruns

Quick Info

CategoryAI Agents
Difficultyadvanced
Version1.0.0
AuthorClaude Skills Hub
ai-agentsmemoryvector-stores

Install command:

curl -o ~/.claude/skills/ai-agent-memory.md https://clskills.in/skills/ai-agents/ai-agent-memory.md

Related AI Agents Skills

Other Claude Code skills in the same category — free to download.

Want a AI Agents skill personalized to YOUR project?

This is a generic skill that works for everyone. Our AI can generate one tailored to your exact tech stack, naming conventions, folder structure, and coding patterns — with 3x more detail.