Your Agent Remembers Everything
Snippbot doesn't just process — it learns. A 5-tier cognitive architecture filters noise at input, manages active context, stores episodes with semantic search, builds a knowledge graph of entities and relations, and closes the loop with recall feedback that improves memory quality over time. No cloud. No data leaves your machine. Your agent's memory is yours alone.
Cognitive architecture
Semantic embeddings
Recall principles
Hybrid search fusion
Memory that works like yours
When you learn something new, your brain doesn't store raw text — it extracts meaning, connects to what you already know, and forgets what's irrelevant. Snippbot's memory system mirrors this. Every conversation is automatically captured, entities are extracted into a knowledge graph, and recall is gated by relevance. The result: your agent gets smarter with every interaction, surfacing exactly the right context at the right time — without cluttering the conversation.
The 5-Tier Cognitive Architecture
From noise filtering to closed-loop learning — each tier serves a distinct role in how your agent remembers.
Sensory Buffer
The first line of defense against memory noise. Before any conversation exchange becomes a permanent episode, the sensory buffer applies minimum-length filtering (trivial "ok"/"sure" exchanges are skipped), near-duplicate detection via Jaccard word-set similarity, and content normalization (whitespace cleanup, unicode normalization, truncation at 10K chars). Importance and valence are calculated from content — no more hardcoded 0.5 defaults.
Working Memory
What your agent is "thinking about" right now. The context window manager enforces token budgets (up to 1M tokens for supported models) and compacts older messages via LLM summarization when limits are reached. A session entity tracker monitors which entities are "hot" in the current conversation — topics mentioned repeatedly get priority in recall, making the agent more contextually aware as conversations deepen.
Episodic Memory
Long-term storage for everything your agent has experienced. Every conversation is captured as an episode — stored in SQLite with full-text search via FTS5/BM25 and semantic similarity via 384-dimensional embeddings in an HNSW index. Episodes are scored for importance (0.0–1.0) and valence (negative to positive sentiment), enabling nuanced retrieval that prioritizes breakthroughs, failures, and decisions.
Knowledge Graph
A structured entity-relation graph that captures what your agent knows about the world. As conversations are consolidated, the system extracts entities (languages, frameworks, tools, concepts, people) and relations (uses, knows, depends_on, prefers) with confidence scoring. The graph enables multi-hop traversal — "Python depends_on pip, which is used_by Django, which the user prefers" — powering the Active Association recall principle.
Meta-Cognitive Layer
The system that closes the learning loop. After every chat response, the meta-cognitive layer detects whether recalled memories actually influenced the LLM's output using key-phrase overlap analysis. Memories that are consistently useful get their importance boosted (+0.05); memories that are injected but ignored get decayed (-0.02). An Ebbinghaus forgetting curve applies time-based exponential decay weighted by importance, automatically archiving stale memories. The query analyzer adapts search weights per query type — keyword-heavy for error codes, semantic-heavy for conceptual questions.
Principled Recall: 3 Rules for Relevant Memory
Inspired by how expert learners build knowledge — relevance first, hierarchy second, connections third.
Establish Relevance
Before injecting any memory into context, results must pass a minimum relevance threshold (configurable, default 0.25). Below that? Nothing is injected. Silence is better than noise. Every result that passes includes a "why" annotation — the matched keyword or semantic highlight — so the agent knows why this memory surfaced.
Semantic Tree
Recalled memory is organized hierarchically — not dumped as a flat list. Trunk: high-importance entities you work with (Python, FastAPI). Branches: related entities from the knowledge graph. Leaves: specific episode content. Understand the fundamentals before the details — your agent anchors new information to what it already knows.
Active Association
Beyond keyword and semantic search, the knowledge graph discovers related episodes via entity links. Ask about Django? The graph knows you use Django, that Django depends on Python, and that you discussed Python async patterns last week. Those connections surface automatically — not because the words matched, but because the concepts are linked.
Hybrid Search: Two Strategies, One Answer
Keyword precision meets semantic understanding — fused by Reciprocal Rank Fusion for the best of both.
Keyword Search (FTS5/BM25)
SQLite FTS5 with BM25 ranking — the same algorithm behind search engines. Excels at exact matches, error codes, version numbers, and technical identifiers. Includes a 7-day recency boost that gives recent memories up to 10% more weight.
- Phrase matching, boolean operators, prefix wildcards
- Highlighted snippets with match context
- Best for: acronyms, error codes, snake_case identifiers
Vector Search (HNSW)
Dense 384-dimensional embeddings via all-MiniLM-L6-v2, indexed in an HNSW graph with cosine similarity. Understands meaning, not just words — "how do I handle errors?" finds memories about exception handling, fault tolerance, and retry patterns even without keyword overlap.
- Cosine similarity with M=16 connections, EF=200
- Semantic understanding across paraphrases
- Best for: "how-to" questions, conceptual queries, similarity
Reciprocal Rank Fusion (RRF)
Both search strategies run in parallel and their results are merged using Reciprocal Rank Fusion — a proven technique from information retrieval that combines rankings without requiring score normalization. The formula is simple: score = 1/(k + rank) where k=60. A query analyzer auto-detects the optimal blend — keyword-heavy queries (like error codes) shift to 70% keyword / 30% vector, while conceptual queries ("how do I...") shift to 85% vector / 15% keyword. The default balanced mode uses 70% vector / 30% keyword.
Your Memory, Your Machine
Enterprise-grade data governance with zero cloud dependency.
100% Local Storage
All episodic memory, knowledge graphs, embeddings, and vector indices are stored in SQLite databases on your machine. Nothing is synced, uploaded, or transmitted. Your agent's memory exists only on your hardware.
Configurable Retention
Set retention policies — forever, 1 year, 6 months, 3 months, or 1 month — and episode limits (100–100K). Auto-summarization compresses older episodes. Export your memory as JSON or clear everything with a single action.
Privacy Controls
Automatic secret redaction strips API keys and passwords from stored memory. Name anonymization, regex exclusion patterns, and local-only enforcement ensure sensitive data stays under your control.
Secret Store Encryption
API keys, OAuth tokens, and credentials are encrypted with AES-256-GCM and PBKDF2 key derivation (600K iterations). The OS keychain holds the master key — never a plaintext file. Episodic memory content is stored locally in SQLite with configurable secret redaction to strip sensitive values before storage.
Recall Scope Control
Control what the agent recalls: global (all memories), project-scoped (only current project), session-only, or none. Combined with the relevance threshold, you decide exactly how much context your agent carries.
Full Audit Visibility
Browse your agent's memory through the UI — timeline view, search interface, knowledge graph visualization, and cluster explorer. Every episode is inspectable. Every entity linkage is visible. No black boxes.
Build agents that learn
Every conversation makes your agent smarter. Every entity strengthens the knowledge graph. Every recall gets more relevant. Install Snippbot and the memory system is active from the first message — zero configuration required.