TECH NEWS

Evaluating Deep Agents using LangSmith on AWS | Amazon Web Services

This post was co-authored with Karan Singh, Head of Partnerships at LangChain

Validating AI agent behavior before production is one of the hardest problems in applied AI. Agents are non-deterministic, multi-step where errors in early steps can affect downstream results. A single bad tool call can cascade through an entire workflow. LangSmith on AWS gives you the evaluation framework to catch these issues early, track them in production, and continuously improve your agent’s reliability throughout its lifecycle.

This post combines learnings from LangChain’s work on evaluating deep agents and Anthropic’s guide to demystifying evals for AI agents into a practical guide. In this post, you will learn how to: 1) apply five evaluation patterns for deep agents, 2) build offline evaluations using pytest and LangSmith, and 3) configure online monitoring for production. The walkthrough uses a text-to-SQL deep agent with Amazon Bedrock for the full development to production lifecycle.

Amazon...

Copyright of this story solely belongs to amazon.com. To see the full text click HERE

Evaluating Deep Agents using LangSmith on AWS | Amazon Web Services

Read more

Sources: Amazon has shut down an internal leaderboard that tracked employees' use of AI tools after workers tried to boost their scores with needless tasks

Stop Building Platforms. Start Orchestrating Outcomes: Why Optimizely Opal is an Operating Model Change

Anthropic Says a Mythos-Class AI Model Will Be Available Soon

The internet is being rebuilt for machines