Measuring Agents in Production (Dec 2025)

Title: Measuring Agents in Production (Dec 2025)
Link: http://arxiv.org/abs/2512.04123v1
Date: December 2025

Abstract

This paper presents “Measuring Agents in Production” (MAP), a large-scale systematic study of AI agents actively deployed in production across industries like finance, healthcare, and software development. Through a survey of 306 practitioners and 20 in-depth case studies, the authors analyze the motivations, architectures, and challenges of real-world agent deployments. The findings reveal that production agents prioritize simplicity and controllability: 70% rely on off-the-shelf models without fine-tuning, and most execute fewer than 10 steps before requiring human intervention. While productivity gains drive adoption, reliability remains the primary technical bottleneck, leading to a heavy reliance on human-in-the-loop evaluation strategies.

Key Topics:

AI Agents in Production
Software Engineering for AI
Large Language Models (LLMs)
Human-in-the-loop Evaluation
Agent Reliability
System Architecture
Deployment Challenges

Stop Thinking, Just Do!