The Why Problem: On Losing Your Train of Thought With an LLM
You lose a lot more than just context when you shift to a new session. You lose the history of the reasoning that led you there.
If you’ve spent serious time working with an LLM on anything complex, you’ve felt it: that moment when the context window fills up, or the session ends, and you start fresh. The assistant doesn’t just lose what you were talking about. It loses why you were talking about it. It can be frustrating as hell.
And, it matters more than it sounds like it should.
The Chain You Can’t See
When you’re deep in a problem with an LLM, a kind of reasoning scaffold accumulates in the context. The axioms you established early, the paths you tested and found to be dead ends, and the terminology you settled on after discovering the first term was ambiguous. None of that is in any specific message; it’s distributed across the conversation’s structure, implicit in what you stopped asking and what you kept pursuing.
Then the session resets, and you’re explaining your project to someone who’s never heard of it. Again.
In creative or casual use, this is annoying. In engineering, where the chain of reasoning often IS the work, it’s crippling. The LLM can’t tell you “we tried that approach in message 47 and it failed because of X” because there is no message 47 anymore. Every session starts at zero.
The Hacky Solutions (And Why They Don’t Work)
Power users have developed workarounds. Handoff files that summarize previous sessions. Local memory stores like the MCP memory server. Anthropic even acknowledged the problem by shipping userMemory.
Each solution trades one limitation for another.
Handoff files consume context proportional to their richness; the more history you preserve, the less room you have to actually work. And if you hit context limits within conversations that auto-prune (Gemini) or compact (Claude), session capture mechanisms can forget what came before the window shift. Memory servers require active retrieval, which means you have to know what you’re looking for, which also means you’ve already lost the serendipitous connections that emerge from having the context naturally available. And userMemory is a few hundred tokens of flat text that captures neither the reasoning nor the history, just a compressed summary of conclusions.
There’s also a ceiling problem. Every token you preload is a token you can’t use downstream. Context windows are large now, 128k, 200k, but they’re not infinite, and they fill faster than you’d think when you’re doing actual work. The richer your preserved context, the shorter your usable session.
Three Assumptions Worth Testing
This leads me to a set of claims I’ve been developing, each of which could be stated as a testable hypothesis:
One: Richer context returns richer results. This sounds like a truism, but I don’t mean it in the sense of “a better prompt gets a better answer.” I mean that the decision matrix accumulated across multiple previous session contexts, including the dead ends and the pivots and the why behind each choice, produces qualitatively different outputs than any single well-crafted prompt can achieve. The history is doing work that summaries cannot capture.
Two: It is possible to encode values which store not only transformed states (decisions, outcomes, discoveries) but also the transformation points that arrived there. Not just “we concluded X” but “we concluded X because of the interaction between premises A, B, and C, after rejecting Y for reason Z.” The derivation, not just the result.
Three: It is possible to do this in a way that uses already existing methodology in model construction and training, used in a way that has access to the context window in a read/write fashion, and compressed such that it either consumes very few tokens or doesn’t consume tokens at all. Memory-augmented approaches exist, but the question is whether they can be organized to preserve derivational structure.
The Context Lattice
I’ve formalized these ideas into a concept I’m calling a “context lattice,” borrowing from the mental models of OLAP cube processing and other lattice-type data structures. The core insight is organizational: a directed acyclic graph where nodes represent synthesized states (the conclusions, the pivots, the key recognitions) and edges represent the transformation operators that connected them (the reasoning, the attention patterns, the contextual links).
The architecture follows a “nearer is richer” principle. Recent nodes maintain full resolution; older foundational nodes compress to preserve directional influence without full dimensionality. You don’t need the complete tensor of a conversation from three months ago, but you need to know that the current conclusion traces back to it and how.
Mechanisms already exist in machine learning that could construct this kind of framework. Crosscoders capturing important states. Transcoders capturing the transformation model that arrived at that state. Cross-attention memory approaches like M+ or LongMem providing avenues to keep the lattice in active context without consuming prompt tokens.
What This Would Enable
The end result replaces userMemory, or supplements it, with a longer-term memory that captures not just the what but the why. An LLM that can tell you “we tried that in session 12; here’s why it didn’t work and what we pivoted to” without you having to manually preserve that history or spend tokens loading it.
This also opens something potentially significant for regulated domains. Healthcare, legal work, financial analysis: anywhere an audit trail of reasoning matters. The lattice isn’t just memory; it’s a chain of reasoning log that could satisfy compliance requirements around explainability.
Finding a Road to Meet The Rubber
N=1 doesn’t count as formal research, and I don’t claim otherwise. When I say “research,” I mean it in the sense of “look it up and study,” not in the sense of “controlled experiment with statistical power.” I’ve been bouncing these ideas off engineers and researchers for months now, pressure-testing the assumptions, looking for where the framework breaks.
What I’ve produced is a workshop paper: a testable hypothesis with falsification criteria, a proposed methodology for Protocol A/B/C comparisons between context-primed and unprimed instances, and an architectural framing that connects to existing transformer internals. It’s designed to be wrong in specific, measurable ways if the hypothesis fails.
I think it’s a well-formed idea worth testing rigorously. I don’t have the resources to run controlled comparisons with blind evaluation and multiple human raters myself. That’s the point of the workshop submission: to find collaborators who do.
If you’re working in memory architectures, interpretability, or LLM agents and this sounds interesting, I’d welcome a conversation. DM me.


I hope you find takers. We could use something like this. The problem is real. The issue that I see is that most people using LLMs are using it like Google search. Type a query, get an answer. Maybe go a few turns. close the window.
The continuous, long-term discussion is something the companies actively don't want. The exponential growth of computational complexity. The "it's only for AI companions" thoughts. But you're 100% right. I spent yesterday developing a rigorous paper (posting soon!) and just as we were finalizing, Claude hit that point where it just stops taking prompts. Doesn't compact. Just thinks for a second and stops. So I had to go back a few turns, tell it to save its state, and start a fresh session. Very annoying. fortunately the state captured the gist well enough, and Claude can scan previous sessions to get specifics, but it loses the vibes.
So keep exploring. You might end up having to work with find people here who are experimenting with modifying locally hosted models, and collaborate with them.