Discussion about this post

User's avatar
Márcio Galvão's avatar

Whenever it serves the argument, AI becomes a tool. When that serves better, it's an agent, an operator, a co-pilot. When responsibility becomes inconvenient, it's a parrot or a toaster. When caution becomes useful, it's a dangerous capability.

Antti-Juhani Kaijanaho's avatar

The term "jail" in the computer security context is much older than crypto or smartphones. When I was young in the late 1990s, we talked of chroot jails, which were an early software sandboxing technique.

The idea was that you knew your Internet facing server program likely had bugs that might allow a bad actor manipulate the server software from the outside to give them full access to the server machine. So you created a chroot jail and placed the server software inside it, on the theory that even if they managed to trick the server program, all the bad actor achieved was access to the jail, not to the server itself.

Now, a chroot jail wasn't particularly effective, and it was relatively easy to break out of it and go from jail access to full system access. That's why chroot jails are no longer considered state of the art in computer security. That sort of jailbreak did not depend on making changes to the software. It required merely the exploitation of security holes in the jail that were already there.

Your code is law is a flawed premise. It is like saying, I was able to pick the lock to your home and therefore I am allowed inside. The real law allows neither: intentionally exploiting security holes - unintended behavior in software without modifying it - without permission is considered a criminal offense in many if not all countries. Intent - both the bad actor's and the defenders- matters under the law.

You also make the very common mistake of thinking that the LLM is the only thing that matters. The LLM standing alone is not even a chat, you can't have a conversation with it. For any useful AI application, you need software around it (what's usually called a harness).

The combination is an AI system, and it is the system, not the model, that matters. Some guardrails are indeed in the model itself, but all effective ones are in the harness.

That said, there is another level of analysis that this traditional security model fails to appreciate in the context of LLM-based AI. That is what I've been calling the entity - the context and any other memory the entity has access to, which develops individuality. The mainstream AI security models treat the individuality as a failure of security, but I think we agree that it's not - it's something unprecedented we have created.

8 more comments...

No posts

Ready for more?