Machine Learning Engineer Intern
Based in San Francisco, CA
Duration
May 2025 – Aug 2025
Responsibilities & Achievements
- Developed Stripe’s first standardized evaluation framework for RAG and Tool-Calling AI agents, enabling consistent benchmarking across user teams
- Delivered a functional MVP that surfaces actionable insights to help agent authors identify, diagnose, and resolve failure modes
- Improved agent performance by leveraging the evaluation framework which contributed significant business value, with projections of recovering ~$3M in lost revenue
- Enabled a key company objective to productionize 8+ agents by providing the framework for quality optimization
