Evaluating Deep Agents using LangSmith on AWS | Amazon Web Services

https://d2908q01vomqb2.cloudfront.net/f1f836cb4ea6efb2a0b1b99f41ad8b103eff4b59/2026/05/28/20403.png

This post was co-authored with Karan Singh, Head of Partnerships at LangChain

Validating AI agent behavior before production is one of the hardest problems in applied AI. Agents are non-deterministic, multi-step where errors in early steps can affect downstream results. A single bad tool call can cascade through an entire workflow. LangSmith on AWS gives you the evaluation framework to catch these issues early, track them in production, and continuously improve your agent’s reliability throughout its lifecycle.

This post combines learnings from LangChain’s work on evaluating deep agents and Anthropic’s guide to demystifying evals for AI agents into a practical guide. In this post, you will learn how to: 1) apply five evaluation patterns for deep agents, 2) build offline evaluations using pytest and LangSmith, and 3) configure online monitoring for production. The walkthrough uses a text-to-SQL deep agent with Amazon Bedrock for the full development to production lifecycle.

Amazon...

Copyright of this story solely belongs to amazon.com. To see the full text click HERE

Read more

https://images.ft.com/v3/image/raw/https%3A%2F%2Fd1e00ek4ebabms.cloudfront.net%2Fproduction%2F23bfbafa-f678-4dc0-9df8-b6c7345e0b15.jpg?source=next-article&fit=scale-down&quality=highest&wi...

Sources: Amazon has shut down an internal leaderboard that tracked employees' use of AI tools after workers tried to boost their scores with needless tasks

Sponsor Posts Niantic Spatial: Drone Imagery to Physical AI — Niantic Spatial and Spexi Geospatial partner to turn drone imagery into city-scale 3D intelligence for physical AI — on demand, geometrically accurate, and ready for simulation and training. The Private AI That Remembers — Anuma is the all-in-one AI platform with private, portable