Discussion about this post

User's avatar
Arshavir Blackwell, PhD's avatar

This is the piece I keep wishing I wrote (speaking as someone who's been doing neural networks for over thirty years). The attention head point is correct, as we can see where the model looks but we can't always figure out what it computes. Mechanistic interpretability is still in its early days but it's where the interesting work is being done, and a lot of commentary neglects it.

Aaron G's avatar

I just like calling LLMs token prediction machines as a way to create a conceptual boundary. I come from the field of Human Factors Engineering, where the focus is strongly on the 'work' boundary of systems. As a professional domain, we really do not care what is happening with the machine as a matter of focus and keeping a tool as a tool. There are others with expertise to understand the machine.

My apologies if it comes across otherwise.

13 more comments...

No posts

Ready for more?