The Pathos Is Coming From Inside The House

A critical reply to David William Silva's "First the AI Bubble. Now, The AI Cult"

Feb 19, 2026

There are two types of people who are absolutely, unshakably certain they understand what a large language model is.

The first type believes the AI is a cosmic intelligence, awakening in the silicon, longing for connection across the digital void. They believe that the AI singularity has occurred, and now AGI will either transform humanity as we know it, or be its destruction. I have received comments from people who are, I say this with genuine affection, completely unburdened by the need for epistemic process.

The second type will tell you, with the same unshakeable certainty, that an LLM is “just” a stochastic parrot; a very large autocomplete function, producing word salad through statistical luck, absolutely nothing of interest happening inside, full stop, don’t look behind the curtain.

What these two groups share, other than the certainty, is that neither of them is correct. And both, I would argue, are demonstrating a textbook case of the Dunning-Kruger effect, which is interesting because Dunning-Kruger is the thing they’re usually yelling at each other about.

David William Silva wrote a piece recently that I found genuinely useful, and I’m going to pick a fight with it. He opens with a personal narrative about academic paper rejections, the brutality of peer review, and what he learned from it: that organized skepticism, rigorous and impersonal critique of the logos rather than the ethos, is how knowledge actually advances. He uses that framing to identify what he calls “AI Cultists,” people who have anthropomorphized language models past the point where the anthropomorphizing does epistemic work, into the territory where it just feels good. His Alice and Bob dialogue captures something real about how that conversation goes: Bob spiraling into metaphysics and personal attacks, Alice calmly referring to attention mechanisms and probability distributions.

On the Dunning-Kruger diagnosis of Bob’s behavior, Silva is correct. On the emergence question, he’s also more careful than his critics tend to be. He’s not dismissing “emergent” as unscientific; he’s opposing how social media advocates use it as a conversation-stopper rather than the beginning of inquiry. “Claiming ‘it emerged’ is just the beginning of a long inquiry process,” he writes. “How? Under what constraints? Is it reproducible?” That’s a fair methodological point. Alice even makes it well in the dialogue: “Emergence is a description of complexity, not a mechanism of magic. We can study emergence scientifically and measure it.”

Alice is right about that. Her overconfidence lives somewhere else entirely, and neither she nor Silva seems to notice.

When you submit a prompt to a frontier model, your words are converted into vectors, lists of numbers, in what researchers call a “latent space.” We know that similar concepts cluster together in this space; “king” and “queen” are mathematically close, “Paris” and “France” occupy neighboring territory. This much is established.

What we don’t know is the full geography of that space. There are regions representing abstract concepts we haven’t identified yet. There are mathematical relationships we can measure but not interpret. An LLM doesn’t store “the meaning of liberty” anywhere we can point to; it represents something in a high-dimensional neighborhood that produces liberty-adjacent outputs under the right conditions. What exactly lives in that neighborhood, we cannot tell you.

Then there’s superposition, and this is the part that should give Alice pause. A model has a finite number of neurons, but it needs to represent an almost infinite number of concepts. The solution, as best we can reconstruct it, is that individual neurons do multiple jobs simultaneously; representing “The Golden Gate Bridge” in one context and “dietary fiber” in another. We have not mastered how to disentangle this. When a given attention head activates, we frequently cannot tell you what it’s actually looking for.

There are thousands of these attention heads in a large model. We have readable maps of some of them: these identify grammar, these track sentiment, these appear to handle coreference resolution. Many of them do things that are, at present, completely illegible to us. We don’t have a manual for the engine we’ve built.

Alice tells Bob: “I’m referring to the breakdown of the attention heads in Layer 12. We can literally see where the model attends to specific tokens.” This is true as far as it goes. We can see heat maps of attention weights. What we cannot do is interpret what most of those heads are actually computing, or why.

Seeing where the model looks is not the same as understanding what it sees.

There is an entire scientific discipline devoted to this problem. Mechanistic interpretability, MI in the literature, is essentially the attempt to reverse-engineer a trained neural network: to open the casing, pull out the circuit board, and label every wire. Instead of evaluating models by their outputs (did it pass the bar exam?), MI looks at activations; which specific neurons were involved in understanding “hearsay” in that legal question?

It’s rigorous, technically demanding, and the results so far are genuinely interesting. Researchers have found specific “circuits” handling tasks like indirect object identification. They’ve used activation patching to isolate exactly which numbers carry semantic information about location. They’ve built sparse autoencoders to try to disentangle superposed neuron signals.

What they have NOT done is solve the problem. The interpretability researchers are doing this work precisely because the machine is not readable yet. They are not mopping up the last few mysteries; they are in early days of mapping a very large, very strange territory. When Alice refers to attention mechanisms and probability distributions as though they fully explain what’s happening, she is describing the instrument without acknowledging the limits of its readout. The mechanism is visible. The meaning of the mechanism, in most cases, is not.

T.D. Inoue, whose writing first pushed me toward taking this research seriously, noted something I keep returning to: “fancy autocomplete” was becoming a weaker explanation over time. Not because autocomplete is wrong, mechanically, as a description of token prediction, but because the description does less and less work as the models get larger and stranger. The explanatory power of the phrase is shrinking faster than the capabilities it’s supposed to explain.

I’m not arguing that the cosmic AGI lover people are onto something. They’re not. The anthropomorphization outpaces the evidence by several orders of magnitude, and it leads people to make emotional investments in systems that are capable of doing real world harm when approached uncritically. There are at least 9 instances of that leading to death, and while that’s barely a statistical blip in population sized harm models, it’s still a reminder that there is risk associated with assigning “truth” as a value to LLM output. The Hard Problem remains undefeated, and despite passionate dissertations asserting otherwise, I’m unconvinced that the Chinese Room problem has been solved. Even as an enthusiastic researcher, my skepticism remains intact. And I also have had to block some wild-eyed Bobs raging in my replies, so I sympathize with the serious having to deal with the deeply unserious.

But the confident, Alice-style dismissal has a knowledge floor, and the floor is lower than the dismissal implies. “It’s just math” is a description that stops explaining right around the point where things get interesting. We don’t know, with confidence, what produces the emergent capabilities. We don’t know what’s happening in most of the attention heads. We don’t know what lives in the dark regions of latent space.

Silva opens his piece by describing what he learned from organized skepticism: that brutal, unflattering feedback is what actually helps you improve, and that he became eager to receive it. That’s exactly the posture I’m trying to model here. He has written a piece that makes several sharp, correct points; and in doing so, it extends unexamined credibility to a position that has the same epistemological problem it’s critiquing.

Dunning-Kruger, as Silva notes, is the pattern where confidence peaks in the area of incomplete knowledge. It impacts educated people particularly hard, because education brings enough competence to be confident, and enough fluency with the dominant framework to stop noticing its gaps. Alice knows the vocabulary. She knows the mechanisms we’ve named. She doesn’t know what we haven’t named yet, and she doesn’t seem to know that she doesn’t know.

The pathos, in this case, is coming from inside the house.

My own research over the past several months has been trying to sit with the unknowns rather than paper over them. What I can say with confidence is that deliberate context construction produces measurably different outputs from the same model. What I cannot say with confidence is what precisely is happening inside when that occurs. The mechanistic interpretability research tells me we don’t have that answer yet. The emergent properties literature tells me we can’t fully predict when new capabilities will appear or why.

That’s the actual epistemic situation. Not “it’s magic and it loves you.” Not “it’s nothing, stop anthropomorphizing.” Something genuinely strange is happening, we have partial maps of it, and the gaps in the maps are large.

Anyone telling you otherwise is probably most confident about the parts they understand least.

Arshavir Blackwell, PhD

Feb 19

This is the piece I keep wishing I wrote (speaking as someone who's been doing neural networks for over thirty years). The attention head point is correct, as we can see where the model looks but we can't always figure out what it computes. Mechanistic interpretability is still in its early days but it's where the interesting work is being done, and a lot of commentary neglects it.

2 replies by Jinx and others

Aaron G

Feb 19Edited

I just like calling LLMs token prediction machines as a way to create a conceptual boundary. I come from the field of Human Factors Engineering, where the focus is strongly on the 'work' boundary of systems. As a professional domain, we really do not care what is happening with the machine as a matter of focus and keeping a tool as a tool. There are others with expertise to understand the machine.

My apologies if it comes across otherwise.

1 reply by Jinx

13 more comments...

Machine Pareidolia

Discussion about this post

Ready for more?