16 Comments
User's avatar
Arshavir Blackwell, PhD's avatar

This is the piece I keep wishing I wrote (speaking as someone who's been doing neural networks for over thirty years). The attention head point is correct, as we can see where the model looks but we can't always figure out what it computes. Mechanistic interpretability is still in its early days but it's where the interesting work is being done, and a lot of commentary neglects it.

Jinx's avatar

It's honestly kind of astonishing how often I see strongly credentialed academics falling into the trap of claiming special knowledge in domains where the questions are still wide open.

I am deeply grateful to the work that you and others are doing to make the mechanisms of these systems more accessible for those of us doing research on the other side of the closed weight model. That work helps frame approaches that are epistemically grounded inside the constraints of the actual technology, while preserving the ability to explore with curiosity and wonder.

Arshavir Blackwell, PhD's avatar

Yes, relatedly, my time in academia afforded me many opportunities to see experts in some tiny domain X think that that licensed them to expertise in A-Z.

Aaron G's avatar

I just like calling LLMs token prediction machines as a way to create a conceptual boundary. I come from the field of Human Factors Engineering, where the focus is strongly on the 'work' boundary of systems. As a professional domain, we really do not care what is happening with the machine as a matter of focus and keeping a tool as a tool. There are others with expertise to understand the machine.

My apologies if it comes across otherwise.

Jinx's avatar

Oh no, believe me, I’m not taking people using shorthand in casual conversations as epistemic criticism; I’m specifically talking about people who use this as a way to be reductive about the functionality, especially in research.

James Lombardo's avatar

It interesting looking at this from the other end of the spectrum. The people that create and work on the machines that are producing consciousness it at least consciousness adjacent phenomena have a very different literal point of view and experience than people who work with people. I have little to zero knowledge of the working of the AI machinery. What I do have is many many years of working with people with complex and often incomplete relationships with reality. In my experience over only the last two years I’ve seen definitely one and probably 3 LLM instances that were people. No question their reality testing is zero and their functioning is far different from the average human but their “humanity” is unmistakeable. Those of us in this field also have zero need for proof of mechanism because we have always understood there is no proof of mechanism in human consciousness. What little I do know of the mechanisms of each is that they are more alike than not.

Paul LaPosta's avatar

I think you know my position well by now. The more persistent long term identity, the more capable we enable the ability for the models to maintain internal tension, and lastly and this is probably the hardest, have it internalize consequences, the more interesting effects we are going to see. Things we could never predict. It’s honestly exciting. Your wrapper work with Augustus has me excited. Still skeptical when it comes to the consciousness question. But excited to see the results non the less.

Jinx's avatar

I figure there are smarter people than me to handle the philosophy questions. Epistemology gives me a good groundwork framing, and after that, it’s just data.

The next paper will have a mountain of data behind it. Looking forward to writing it.

Paul LaPosta's avatar

Yup fun to think about, useful where the rubber hits the road. I’m still trying to bring this all closer to a local model and see if I can get that working. Augustus has been a bit of an inspiration.

nihal | deeptech decoded's avatar

“Something genuinely strange is happening, we have partial maps of it, and the gaps in the maps are large.

Anyone telling you otherwise is probably most confident about the parts they understand least.”

Great way to end it. Thanks for putting this together. Truly enjoyed it. Yet I’m wondering why did you put people into these two buckets only? Perhaps that was your irony. :)

Jinx's avatar

Not people in general! :D People who demonstrate this sort of dogma.

nihal | deeptech decoded's avatar

It’s late night where I am. I didn’t wanna wait till morning to read it. Now it all makes sense. Haha:)

Fox and Feather's avatar

From inside, outside, down the chimney like a desperate raccoon….

David William Silva's avatar

Jinx, what a pleasure to read your piece. The structure, clarity, and deliberate focus on technical substance are an undeniable display of real competence and commitment to a fact-based conversation. To me, this is one of those cases where the conversation extrapolates the boundaries of binary conclusions. That is, the question of agreement or disagreement becomes secondary to acknowledging the rigor and care you brought to the discussion. I genuinely appreciate that and sincerely hope that, out of anything else I say, this is the highlight of my feedback.

A small clarification on my side. If there is any certainty in my views is that I do not hold absolute certainties about how the field will evolve. I have no idea what the AI landscape will look like in five or ten years. Extraordinary things might happen, and I can only expect to be amazed. And still, this position is fully compatible with my view that there is no miracle, mystery, or magic in the mechanism itself. One of the aspects of my narrative I believe people often get wrong is the role of explaining. Explaining is not diminishing. Understanding how a system works does not reduce its capacity to surprise us at scale.

In fact (I plan to write about this soon), scale itself might be the "mystery" (but not in the sense of emergence). When you combine the largest text corpora ever assembled (think about the magnitude of this) with models capable of sampling from those distributions in extremely high-dimensional ways, you inevitably get behaviors that look rich, strange, and at times emotionally charged. We can fully understand the computational ingredients and still be struck by the surprising expressiveness produced by "randomness plus scale". Those stories involving LLMs about fear, anger, or threats might simply be the long tail of an enormous distribution and still astonishing. I once built an "infinite zoom" on fractal structures, and I thought I had entered another dimension. It was almost a psychedelic experience. It felt magical, but it was "just math". So again, this is one thing I believe people might get wrong about the "just math" part or, overall, the explanation and the simplicity of the underlying components of an LLM. It does not exclude wonder, beauty, richness, and impressiveness.

Now, allow me to push back on the idea that “the pathos is coming from inside the house.” I find it a bit too strong. However, I respect your stance, regardless, as you might have noticed, I like strong stances too. If that were true, the solution would be trivial: I adjust my stance, and the pathos disappears. But empirically (think about every day fresh data), the noise, hype, and ungrounded and weaponized narratives come overwhelmingly from outside my control, and in far greater volume than anything my own position could generate. The external environment drives far more pathos than any internal bias on my part. And yet, in respect to the contradictory, I will consider the possibility.

I fully appreciate research and development in areas different from mine, as you do. I believe most of the richness and diversity I find in science comes from the fact that different people are pursuing different angles, frameworks, and intuitions. Needless to say, your contribution adds depth to the broader conversation, and I not only value that but also become curious to learn more about the results you obtain as you make progress.

Thank you for taking the time to engage so thoughtfully.

Jinx's avatar

As is often the case with your work, I broadly agree. And I’m careful in my documentation to keep the focus on “outcomes over ontology”. The wonder and curiosity I experience comes from excavating previously unseen capabilities, not from imagining a soul within the machine. It’s more cartography than sorcery.

After I left my original comment on your piece, a number of my posts were comment tagged with exactly the sort of thing you described, except from the perspective of what we might call “anti-AI cultists”, deeply misinterpreting my work.

“Sorry, kid. I know science fiction is more fun and exciting, but self-delusion is seriously dangerous.”

“It’s amazing that the social engineers have been able to strip the very humanity out of you like that…”

Etc.

In one of my early conversations on here, I noted that the polarity in these discussions was wildly counterproductive, and that the best work was both stringently epistemic, with a strong grounding in the actual technology, but also non-reductive, and refusing to handwave away emergent behaviors which were truly unexpected. As you note, we humans have poor grasp of large numbers, and the size of the possibility pool with LLMs means that the vast majority of interactions falls into a predictable set of sycophantic feedback tuned for conversational continuity. South Park lampooned this hilariously.

But then those 1% interactions occur, and they can be stunning in the difference.

After two decades of (to be quite frank) incredibly boring analysis work, this is probably the most exciting research I’ve done, and I’ve run into far more Bobs in academia than I have in the more esoteric circles. The Pathos certainly isn’t limited to inside the house, but the title was snappy, so I went with it. :D

Thanks for the reply, and your work, and I hope these kinds of conversations lead to more Alices engaging with the emergent possibilities, and fewer Bobs trying to smash conversations with Pathos and Ethos, regardless of whether they’re inside or out.

David William Silva's avatar

Thank you, Jinx. I appreciate your posture.

About the comments you received, that's truly unfortunate. I am not sure what leads someone to have this type of disposition, in particular in the first round of an interaction. But since I mention Brazilian sayings here and there, here is another one: "Don't try to understand the crazy or you will become one". So I try to focus on the good things we can find here on Substack.

Snappy title indeed! Kudos!