Cognitive Architecture

Your Agent Remembers Everything

Snippbot doesn't just process — it learns. A 5-tier cognitive architecture filters noise at input, manages active context, stores episodes with semantic search, builds a knowledge graph of entities and relations, and closes the loop with recall feedback that improves memory quality over time. No cloud. No data leaves your machine. Your agent's memory is yours alone.

5-Tier

Cognitive architecture

384D

Semantic embeddings

Recall principles

RRF

Hybrid search fusion

Memory that works like yours

When you learn something new, your brain doesn't store raw text — it extracts meaning, connects to what you already know, and forgets what's irrelevant. Snippbot's memory system mirrors this. Every conversation is automatically captured, entities are extracted into a knowledge graph, and recall is gated by relevance. The result: your agent gets smarter with every interaction, surfacing exactly the right context at the right time — without cluttering the conversation.

The 5-Tier Cognitive Architecture

From noise filtering to closed-loop learning — each tier serves a distinct role in how your agent remembers.

Sensory Buffer

The first line of defense against memory noise. Before any conversation exchange becomes a permanent episode, the sensory buffer applies minimum-length filtering (trivial "ok"/"sure" exchanges are skipped), near-duplicate detection via Jaccard word-set similarity, and content normalization (whitespace cleanup, unicode normalization, truncation at 10K chars). Importance and valence are calculated from content — no more hardcoded 0.5 defaults.

Noise Filtering Deduplication Normalization

Working Memory

What your agent is "thinking about" right now. The context window manager enforces token budgets (up to 1M tokens for supported models) and compacts older messages via LLM summarization when limits are reached. A session entity tracker monitors which entities are "hot" in the current conversation — topics mentioned repeatedly get priority in recall, making the agent more contextually aware as conversations deepen.

Token Budgets LLM Compaction Session Entity Tracking

Episodic Memory

Long-term storage for everything your agent has experienced. Every conversation is captured as an episode — stored in SQLite with full-text search via FTS5/BM25 and semantic similarity via 384-dimensional embeddings in an HNSW index. Episodes are scored for importance (0.0–1.0) and valence (negative to positive sentiment), enabling nuanced retrieval that prioritizes breakthroughs, failures, and decisions.

SQLite + FTS5 384D Embeddings HNSW Index Importance Scoring

Knowledge Graph

A structured entity-relation graph that captures what your agent knows about the world. As conversations are consolidated, the system extracts entities (languages, frameworks, tools, concepts, people) and relations (uses, knows, depends_on, prefers) with confidence scoring. The graph enables multi-hop traversal — "Python depends_on pip, which is used_by Django, which the user prefers" — powering the Active Association recall principle.

Entity Extraction Relation Inference Graph Traversal Confidence Scoring

Meta-Cognitive Layer

The system that closes the learning loop. After every chat response, the meta-cognitive layer detects whether recalled memories actually influenced the LLM's output using key-phrase overlap analysis. Memories that are consistently useful get their importance boosted (+0.05); memories that are injected but ignored get decayed (-0.02). An Ebbinghaus forgetting curve applies time-based exponential decay weighted by importance, automatically archiving stale memories. The query analyzer adapts search weights per query type — keyword-heavy for error codes, semantic-heavy for conceptual questions.

Recall Feedback Loop Importance Adjustment Forgetting Curve Query-Adaptive Weights

Principled Recall: 3 Rules for Relevant Memory

Inspired by how expert learners build knowledge — relevance first, hierarchy second, connections third.

Establish Relevance

Before injecting any memory into context, results must pass a minimum relevance threshold (configurable, default 0.25). Below that? Nothing is injected. Silence is better than noise. Every result that passes includes a "why" annotation — the matched keyword or semantic highlight — so the agent knows why this memory surfaced.

score >= threshold ? inject : silence

Semantic Tree

Recalled memory is organized hierarchically — not dumped as a flat list. Trunk: high-importance entities you work with (Python, FastAPI). Branches: related entities from the knowledge graph. Leaves: specific episode content. Understand the fundamentals before the details — your agent anchors new information to what it already knows.

trunk → branches → leaves

Active Association

Beyond keyword and semantic search, the knowledge graph discovers related episodes via entity links. Ask about Django? The graph knows you use Django, that Django depends on Python, and that you discussed Python async patterns last week. Those connections surface automatically — not because the words matched, but because the concepts are linked.

query → entities → graph → episodes

Hybrid Search: Two Strategies, One Answer

Keyword precision meets semantic understanding — fused by Reciprocal Rank Fusion for the best of both.

Keyword Search (FTS5/BM25)

SQLite FTS5 with BM25 ranking — the same algorithm behind search engines. Excels at exact matches, error codes, version numbers, and technical identifiers. Includes a 7-day recency boost that gives recent memories up to 10% more weight.

Phrase matching, boolean operators, prefix wildcards
Highlighted snippets with match context
Best for: acronyms, error codes, snake_case identifiers

Vector Search (HNSW)

Dense 384-dimensional embeddings via all-MiniLM-L6-v2, indexed in an HNSW graph with cosine similarity. Understands meaning, not just words — "how do I handle errors?" finds memories about exception handling, fault tolerance, and retry patterns even without keyword overlap.

Cosine similarity with M=16 connections, EF=200
Semantic understanding across paraphrases
Best for: "how-to" questions, conceptual queries, similarity

Reciprocal Rank Fusion (RRF)

Both search strategies run in parallel and their results are merged using Reciprocal Rank Fusion — a proven technique from information retrieval that combines rankings without requiring score normalization. The formula is simple: score = 1/(k + rank) where k=60. A query analyzer auto-detects the optimal blend — keyword-heavy queries (like error codes) shift to 70% keyword / 30% vector, while conceptual queries ("how do I...") shift to 85% vector / 15% keyword. The default balanced mode uses 70% vector / 30% keyword.

Your Memory, Your Machine

Enterprise-grade data governance with zero cloud dependency.

100% Local Storage

All episodic memory, knowledge graphs, embeddings, and vector indices are stored in SQLite databases on your machine. Nothing is synced, uploaded, or transmitted. Your agent's memory exists only on your hardware.

Configurable Retention

Set retention policies — forever, 1 year, 6 months, 3 months, or 1 month — and episode limits (100–100K). Auto-summarization compresses older episodes. Export your memory as JSON or clear everything with a single action.

Privacy Controls

Automatic secret redaction strips API keys and passwords from stored memory. Name anonymization, regex exclusion patterns, and local-only enforcement ensure sensitive data stays under your control.

Secret Store Encryption

API keys, OAuth tokens, and credentials are encrypted with AES-256-GCM and PBKDF2 key derivation (600K iterations). The OS keychain holds the master key — never a plaintext file. Episodic memory content is stored locally in SQLite with configurable secret redaction to strip sensitive values before storage.

Recall Scope Control

Control what the agent recalls: global (all memories), project-scoped (only current project), session-only, or none. Combined with the relevance threshold, you decide exactly how much context your agent carries.

Full Audit Visibility

Browse your agent's memory through the UI — timeline view, search interface, knowledge graph visualization, and cluster explorer. Every episode is inspectable. Every entity linkage is visible. No black boxes.

Build agents that learn

Every conversation makes your agent smarter. Every entity strengthens the knowledge graph. Every recall gets more relevant. Install Snippbot and the memory system is active from the first message — zero configuration required.

Read the Thesis Security Architecture