LLMFS — Filesystem Memory for LLMs and AI Agents¶

LLMFS gives LLMs and AI agents persistent, searchable, structured memory — organized like a filesystem. Instead of losing context when a conversation grows past the token limit, agents offload memories to LLMFS and retrieve exactly what they need, when they need it.

The result: zero information loss and an effectively unlimited context window — even over thousands of turns.

The Problem¶

Every LLM agent eventually hits the same wall: the context window fills up.

The standard solution — lossy summarization — destroys information. When an agent summarizes 80k tokens into 5k, 94% of the detail is gone forever. Ask it about a specific line of code from 30 turns ago, and it can only apologize.

The Solution¶

LLMFS takes a different approach, borrowed directly from operating systems:

OS Concept     →   LLM Concept
──────────────────────────────────────────────────────────
RAM            →   Context Window (e.g. 128k tokens)
Disk / Swap    →   LLMFS  (500k+ tokens, full fidelity)
Page eviction  →   Offload old turns to LLMFS
Page fault     →   LLM calls memory_search / memory_read
Virtual addr   →   Memory path  (/session/turns/42)
MMU            →   ContextManager

Memories are stored at filesystem-style paths (/projects/auth/bug, /events/2026-03-15_fix) and searched semantically. They persist across sessions, support TTLs, carry metadata and tags, and can be linked in a knowledge graph.

Quick Start¶

pip install llmfs
llmfs init
llmfs write /knowledge/hello "LLMFS stores memories at filesystem paths"
llmfs search "how does memory storage work"

from llmfs import MemoryFS

mem = MemoryFS()
mem.write("/projects/auth/bug", "JWT expiry misconfigured at auth.py:45", tags=["jwt", "bug"])
results = mem.search("authentication error", k=3)
print(results[0].path, results[0].score)

Get Started Python API GitHub

Key Features¶

Filesystem Metaphor

Organize memories at intuitive paths like /projects/auth/bug with hierarchical structure, tags, and metadata.
Unlimited Context

Virtual memory model evicts old turns to LLMFS and pages them back in on demand. Zero information loss.
Hybrid Search

Semantic vector search + BM25 keyword search with reciprocal rank fusion. Sub-100ms over 10k memories.
Knowledge Graph

Link memories with typed relationships (caused_by, follows, contradicts) and traverse with BFS/DFS.
MCP Server

Built-in Model Context Protocol server for Claude, Cursor, Windsurf, and any MCP client.
MQL Query Language

Custom query language: SELECT memory FROM /knowledge WHERE SIMILAR TO "auth bug" LIMIT 5
Framework Integrations

Drop-in adapters for LangChain, OpenAI function calling, and any tool-use LLM.
Local-First

Runs entirely on your machine. SQLite + ChromaDB. No API keys needed. 22 MB embedding model, CPU-only.

Performance¶

Operation	Target	Notes
Write (500 tokens)	< 200 ms	Includes chunking + embedding
Search (10k memories)	< 100 ms	Cached repeats in < 1 ms
Read (by path)	< 10 ms	SQLite lookup + chunk assembly
MQL query	< 200 ms	Parse + search
Context eviction (20 turns)	< 500 ms	Includes artifact extraction