Everyone is still trying to prompt their agents into reliability. The tooling quietly moved the other way last week
LangChain shipped interpreter skills. Agents now run real code modules, not just prompt instructions.
https://www.langchain.com/blog/interpreter-skills
Deepagents added a middleware that grades the agent's own work against a rubric.
The pattern under both is the same.
Stop asking the model to behave. Start giving it behavior it cannot get wrong.
A prompt is a request. Every response is probabilistic.
Code is a guarantee. The same input returns the same output every time.
I moved a piece of agent behavior out of the prompt and into a script, and the accuracy went up, because the output became deterministic.
That is the whole trade, and it is not free.
Writing your own logic is expensive. Asking the model is cheaper, and less accurate.
So the real question on every step is which one you can afford to get wrong: • if a step must be correct, encode it as code • if a step needs judgment, leave it to the prompt • if a step is right most of the time, wrap the prompt in a check
Skills that execute code is a bigger update than it looked. It moves the reliable part of an agent out of language and into logic.
Prompts make an agent sound right.
Code makes it be right.
#ai #agentic-ai #llm
