Architecture¶
LLMFS uses a layered architecture with a single entry point (MemoryFS) that delegates to specialized subsystems.
System Overview¶
┌─────────────────────────────────────────────────────────────────┐
│ Public Interfaces │
│ CLI (llmfs) Python API MCP Server LangChain OpenAI │
└────────────────────────────┬────────────────────────────────────┘
│
┌────────▼────────┐
│ MemoryFS │ ← Single entry point
│ (filesystem.py)│ write · read · search
│ │ update · forget · relate
│ │ list · query · gc
└────┬──────┬─────┘
│ │
┌──────────────▼──┐ ┌─▼──────────────────┐
│ Storage Layer │ │ Embedding Layer │
│ │ │ │
│ SQLite (WAL) │ │ all-MiniLM-L6-v2 │
│ · file registry│ │ (local, 22 MB, CPU) │
│ · chunks │ │ or OpenAI text- │
│ · tags │ │ embedding-3-small │
│ · graph edges │ │ │
│ · search cache │ └──────────────────────┘
│ │
│ ChromaDB │
│ · chunk vectors│
│ · metadata │
│ · similarity │
└─────────────────┘
│
┌─────────▼──────────────────────────────────┐
│ Processing Pipeline │
│ │
│ AdaptiveChunker ExtractiveSummarizer │
│ (code-aware AST (TF-IDF, level 1+2) │
│ or prose-aware) │
│ │
│ RetrievalEngine MemoryGraph │
│ (semantic + (BFS/DFS, relationship │
│ temporal + types, strength) │
│ graph hybrid) │
│ │
│ ContextManager MQL Parser+Executor │
│ (virtual memory (custom query language │
│ manager for → ChromaDB + SQLite) │
│ infinite ctx) │
└────────────────────────────────────────────┘
On-Disk Layout¶
~/.llmfs/ # default; or .llmfs/ in current directory
metadata.db # SQLite — file registry, chunks, tags, graph, cache
chroma/ # ChromaDB persistence — embedding vectors
config.json # optional configuration overrides
Storage Layer¶
SQLite (metadata.db)¶
All structured data lives in SQLite with WAL mode enabled for concurrent reads:
| Table | Purpose |
|---|---|
files | File registry (id, path, name, layer, size, timestamps, TTL, content hash) |
chunks | Chunk text, byte offsets, embedding IDs, per-chunk summaries |
tags / file_tags | Many-to-many tag system |
relationships | Graph edges (source, target, type, strength) |
search_cache | Query result cache (SHA-256 keyed, 5-min TTL) |
embedding_cache | Avoids re-embedding identical strings |
chunks_fts | FTS5 virtual table for BM25 keyword search |
ChromaDB¶
Chunk embeddings are stored in a ChromaDB collection with an HNSW index (cosine similarity). Each chunk carries metadata: file_id, path, layer, chunk_index, tags.
Embedding Layer¶
LLMFS lazy-loads the embedder on first use:
- Default:
all-MiniLM-L6-v2via sentence-transformers (22 MB, CPU-only, 1000+ queries/sec) - Optional: OpenAI
text-embedding-3-small(higher quality for some domains, requires API key)
Embeddings are cached in SQLite keyed by (text_hash, model_name) to avoid redundant computation across sessions.
Processing Pipeline¶
Adaptive Chunker¶
Content-aware chunking with three strategies:
| Content Type | Target Size | Strategy |
|---|---|---|
| Python code | 512 tokens | AST-based splitting at def/class boundaries |
| Markdown | 256 tokens | Header-based splitting, sub-split by paragraph |
| Plain text | 256 tokens | Sliding window with 50-token overlap |
Retrieval Engine¶
Two-stage hybrid search:
- Dense search — query embedded, cosine similarity via ChromaDB
- Sparse search — BM25 keyword matching via SQLite FTS5
- Fusion — Reciprocal Rank Fusion merges results, deduplicates by path
Memory Graph¶
Typed, weighted edges between memories with BFS/DFS traversal. Relationship types: related_to, follows, caused_by, contradicts.
Context Manager¶
The virtual memory subsystem that powers infinite context. See Infinite Context for details.