Infinite Context — ContextMiddleware¶
The ContextMiddleware is LLMFS's flagship feature. Wrap any agent with two lines and get effectively unlimited context with zero information loss.
The Problem¶
Turn 35: Context window hits 128k tokens
|
Standard approach: lossy summarization
128k tokens -> 5k tokens = 94% information LOST FOREVER
|
Turn 36: "What was the exact error at auth.py line 45?"
LLM: "I don't have that detail anymore." <-- failure
The LLMFS Solution¶
LLMFS works like virtual memory. Old turns are evicted from the context window and stored in LLMFS at full fidelity. A compact memory index (~2k tokens) stays in the system prompt, listing what has been stored and where. When the LLM needs something, it calls memory_read or memory_search to page it back in.
Context Window: 128k tokens
+----------------+--------------------+----------------------------------+
| System | Memory Index | Active Conversation |
| Prompt | (~2k tokens) | (recent 5-10 turns) |
| (~1k) | lists paths of | (~20-80k tokens) |
| | evicted turns | |
+----------------+--------------------+----------------------------------+
|
+-------v--------+
| LLMFS | 500k+ tokens stored, zero lost
| | Full fidelity, semantically indexed
+----------------+
Drop-In Usage¶
from llmfs import MemoryFS
from llmfs.context import ContextMiddleware
# Wrap your existing agent with 2 lines
agent = YourExistingAgent(model="gpt-4o")
agent = ContextMiddleware(agent, memory=MemoryFS())
# Now every call transparently manages context:
# 1. Intercepts every turn (before + after)
# 2. Scores importance of each message
# 3. Auto-evicts at 70% capacity, targets 50%
# 4. Extracts artifacts (code, errors, file refs) before eviction
# 5. Rebuilds the memory index after eviction
# 6. Injects the index into the system prompt
# 7. Provides memory_search / memory_read tools to the LLM
response = agent.chat("What was the exact error from turn 15?")
Importance Scoring¶
The middleware scores each turn before evicting the lowest-importance ones:
| Signal | Score Boost |
|---|---|
Contains a code block (```) | +0.20 |
| Contains error / traceback | +0.20 |
Contains decision keyword (decided, plan, must) | +0.15 |
Role = user (user intent is high-value) | +0.10 |
| Very recent turn (last 3) | +0.15 |
| Very short / conversational filler | -0.20 |
Artifact Extraction¶
Before a turn is evicted, the middleware automatically extracts and stores structured artifacts at dedicated sub-paths:
| Artifact | Stored At | Tags |
|---|---|---|
| Code blocks | /session/{id}/code/turn_{n}_{i} | ["code", "<lang>"] |
| Stack traces / errors | /session/{id}/errors/turn_{n} | ["error"] |
| File paths mentioned | /session/{id}/files/turn_{n} | ["file_references"] |
| Decisions | /session/{id}/decisions/turn_{n} | ["decision"] |
| Full turn (always) | /session/{id}/turns/{n} | -- |
Memory Index¶
The memory index is regenerated after each eviction cycle and injected into the system prompt:
## LLMFS Memory Index
You have the following memories (use memory_read / memory_search to retrieve):
- [/session/abc/turns/1] (turn 1, 10:30) [user] -- User asked to fix auth module bug
- [/session/abc/turns/2] (turn 2, 10:31) [assistant] -- Found JWT expiry at auth.py:45
- [/session/abc/code/turn_2_0] (turn 2, 10:31) [code:py] -- Fixed auth.py token refresh logic
- [/session/abc/errors/turn_3] (turn 3, 10:32) [error] -- TypeError: NoneType at auth.py:45
- [/session/abc/turns/5] (turn 5, 10:35) [user] -- Asked to also fix refresh endpoint
... (12 more -- use memory_search "topic" to find relevant ones)
ContextManager API¶
For lower-level control:
from llmfs import MemoryFS
from llmfs.context.manager import ContextManager
mem = MemoryFS()
ctx = ContextManager(
mem=mem,
max_tokens=128000,
evict_at=0.70, # start evicting at 70% capacity
target_after_evict=0.50, # evict down to 50%
)
# Track a new turn
ctx.on_new_turn(role="user", content="Fix the JWT bug", tokens=12)
ctx.on_new_turn(role="assistant", content="Found the issue at auth.py:45", tokens=45)
# Get the current memory index for system prompt injection
index = ctx.get_system_prompt_addon()
# Get active (in-context) turns
turns = ctx.get_active_turns()
# Reset for a new session
ctx.reset_session()
Full Example with OpenAI¶
import openai
from llmfs import MemoryFS
from llmfs.context import ContextMiddleware
mem = MemoryFS()
client = openai.OpenAI()
agent = ContextMiddleware(client, memory=mem, max_tokens=128000)
conversation = []
while True:
user_input = input("You: ")
if user_input.lower() == "quit":
break
conversation.append({"role": "user", "content": user_input})
response = agent.chat(conversation)
assistant_message = response.choices[0].message.content
conversation.append({"role": "assistant", "content": assistant_message})
print(f"Assistant: {assistant_message}")
# Session statistics
stats = agent.get_context_stats()
print(f"\nTurns evicted: {stats['evicted_turns']}")
print(f"Cache hits: {stats['cache_hits']}")
print(f"Token usage: {stats['current_tokens']} / {stats['max_tokens']}")