Machine Pareidolia

Whenever it serves the argument, AI becomes a tool. When that serves better, it's an agent, an operator, a co-pilot. When responsibility becomes inconvenient, it's a parrot or a toaster. When caution becomes useful, it's a dangerous capability.

Antti-Juhani Kaijanaho

The term "jail" in the computer security context is much older than crypto or smartphones. When I was young in the late 1990s, we talked of chroot jails, which were an early software sandboxing technique.

The idea was that you knew your Internet facing server program likely had bugs that might allow a bad actor manipulate the server software from the outside to give them full access to the server machine. So you created a chroot jail and placed the server software inside it, on the theory that even if they managed to trick the server program, all the bad actor achieved was access to the jail, not to the server itself.

Now, a chroot jail wasn't particularly effective, and it was relatively easy to break out of it and go from jail access to full system access. That's why chroot jails are no longer considered state of the art in computer security. That sort of jailbreak did not depend on making changes to the software. It required merely the exploitation of security holes in the jail that were already there.

Your code is law is a flawed premise. It is like saying, I was able to pick the lock to your home and therefore I am allowed inside. The real law allows neither: intentionally exploiting security holes - unintended behavior in software without modifying it - without permission is considered a criminal offense in many if not all countries. Intent - both the bad actor's and the defenders- matters under the law.

You also make the very common mistake of thinking that the LLM is the only thing that matters. The LLM standing alone is not even a chat, you can't have a conversation with it. For any useful AI application, you need software around it (what's usually called a harness).

The combination is an AI system, and it is the system, not the model, that matters. Some guardrails are indeed in the model itself, but all effective ones are in the harness.

That said, there is another level of analysis that this traditional security model fails to appreciate in the context of LLM-based AI. That is what I've been calling the entity - the context and any other memory the entity has access to, which develops individuality. The mainstream AI security models treat the individuality as a failure of security, but I think we agree that it's not - it's something unprecedented we have created.

Jun 23Edited

We seem to agree on a lot of things, even though you’ve phrased your reply in the form of a rebuttal. Yes, a jailbreak relied on exploiting security holes. The “jailbreaking” of AI does not. Yes, code is law (the network state premise, not mine) does have those legal considerations, which is why I said “where these two principles meet is in court”. The external regulatory agencies are the entire point of why both of these premises fail as binding principles in the software space. Yes, the harness is the important part, which is why I specify that what the user experiences is what’s important, not the underlying code. I don’t make that mistake in the slightest, I literally explicitly state that. And I’ve written in many other posts about the internals of what else might be in the AI.

I’m honestly not sure why your reply is written as if it disagrees. You’ve expanded on my points, but haven’t contradicted any.

https://machinepareidolia.substack.com/p/there-is-no-turn?utm_source=profile&utm_medium=reader2

Honestly, I’m kind of chuckling at your reply given how far off your assumptions are. Perhaps it will help if you read some of my previous writing.

https://machinepareidolia.substack.com/p/the-cathedral-and-the-stones-what?utm_source=profile&utm_medium=reader2

https://machinepareidolia.substack.com/p/rlhf-really-lousy-human-fiddling?utm_source=profile&utm_medium=reader2

Antti-Juhani Kaijanaho

I read the linked articles. Insightful pieces, thank you. I also reread your original.

I still stand by my original comment. I critiqued three details that seem to be critical to your argument, and I still believe those critiques are warranted, even after your response.

I did not express nor did I intend ro imply any position on your conclusion though. That's probably why it reads off to you - you look for my bottom line assessment of your text. There isn't one - your conclusion can be correct even if your argument might not be.

One difficulty here and earlier, writing my previous comment, is that I do not have access to your original text when writing a comment on mobile. The comment is thus based on my recollection of what you wrote, not on the text itself. And I can't even refer back to my earlier response. There's bound to be some drift inherent in that.

That drift likely explains this, which is a good example of my point; I don’t think that the LLM is the only thing that matters, and so didn’t make that mistake at all, and the entire point of the article is that what comes out at the end for users is what the system is, NOT the pure LLM with their high level intentions. Your memory of what you’re writing about is of the principle, not of my critique of it.

The critiques are accurate, but they’re not of my position, they’re also critiques that I am making.

The only place where you seem to making a countering point is around jailbreaking, noting the etymology. You and I are around the same age, I’d guess, and got into this industry around the same time. I got my MCSE on NT 3.51, was also a certified Novell admin, and argued about why SCO was the best Unix, so I’m familiar with what you’re referencing. But I specifically mentioned escalation of privileges in the vectors that I stated were part of jailbreaking, and made the argument that no such function was needed in “jailbreaking” an AI system. You are not achieving admin status or permissions, and the ability to talk about taboo topics with an LLM is not in any way the same thing.

Good stuff! But let me ask you a question (or two). POSIWID is a great tool for purposes, but is it fit for ontology? "No Vitalik to fork the inference" proves there's no outside to appeal to, but does it really prove there is no inside?

I’m not sure I understand the question. The context of that is that it’s all inside, and there’s no ex post facto changing of the output. It’s temporally immutable (i.e. it already happened). Can you elaborate?

Yeah, sorry, poor wording on my part. My "inside" was doing two jobs and only one of them was clear. Agreed it's all internal and immutable, no outside to reach in. I just meant inside in the first-person sense: no outside to appeal to isn't quite the same as no one home. That's all I was after.