4-Layer Agent Memory Hierarchy on a Raspberry Pi: CORE, EPISODIC, SEMANTIC, WORKING
Problem / Context
A Raspberry Pi 4-hosted agent needed to serve responses from a growing knowledge base (50K+ words) without exceeding the Pi's memory constraints or loading full context on every request. Single-file memory approaches either loaded everything (slow, expensive) or loaded nothing relevant (useless responses).
Solution
Implement a four-layer memory hierarchy in Node.js. CORE layer: always-loaded constants (agent identity, hard rules, capability map). 512-token budget, never changes. EPISODIC layer: recent interaction history, session summaries, loaded for conversational continuity. 2K-token budget, rotated daily. SEMANTIC layer: topic-indexed knowledge files, loaded selectively based on task relevance scoring. 4K-token budget, accessed by topic keyword match. WORKING layer: task-specific scratchpad, created fresh each session and discarded on completion. Unlimited, but short-lived. A task classifier runs first to determine which semantic topic files to load before the main agent prompt is assembled. This keeps average context under 8K tokens regardless of total knowledge base size.
Result
Average session context held under 8K tokens on a Raspberry Pi 4 regardless of knowledge base size. CORE layer (512 tokens) loads in under 100ms. Knowledge base grew to 50K words across 40+ topic files without increasing per-session cost. 90%+ of loaded tokens stayed relevant.