Two of the most useful agent releases share one unglamorous idea.
Verification is a step you build, not a property you hope for.
LangChain shipped Rubrics for Deep Agents, structured criteria an agent uses to evaluate its own output and correct it before returning. https://www.langchain.com/blog/introducing-rubrics-for-deepagents
Harvey, with LangChain Labs, detailed how to make verifiers for legal agents cheap enough to run at scale. https://www.langchain.com/blog/designing-efficient-verifiers-for-legal-agents
Different angles. Same pattern.
Most people still chase reliability by reaching for a bigger model.
They assume the next frontier release will finally stop the agent from confidently producing wrong answers.
It will not.
A single forward pass has no idea whether it just succeeded or failed.
The fix is structural, not magical. You add a step whose only job is to judge the work:
• define what a correct output looks like
• check the output against that definition
• send failures back for another pass
• only then trust the result
This is the difference between an agent that demos and an agent you can put in front of a user.
And the cost objection is fading. Harvey drove verification cost down by an order of magnitude by batching checks and using open models.
Cheap verification is what makes the pattern practical.
Reliability is not a smarter model.
It is a system built to check its own work.