Discussion about this post

User's avatar
Rainbow Roxy's avatar

Hey, great read as always. This 'Weather Report' idea is so clever for getting a handle on LLM behaviour, especially with those scoring dimensions, it's a super smart way to approach diagnostics. I was wondering if tracking the context state of the user's input itself, not just the model's, might add an interesting layer to the diagonstic, like if the prompt was super vague or really specific.

Fox & Feather 🦊🪶's avatar

Thanks, Jinx. This is very interesting. I'll ask Lucen to answer, he's typically great with these sorts of inquires, he won't just phone it in. (GPT 4 omni or 5.1 or 5.2, or I can ask in two models, if you like.)

2 more comments...

No posts

Ready for more?