Machine Learning Engineer Intern

Based in San Francisco, CA

Duration

May 2025 – Aug 2025

Responsibilities & Achievements

  • Developed Stripe’s first standardized evaluation framework for RAG and Tool-Calling AI agents, enabling consistent benchmarking across user teams
  • Delivered a functional MVP that surfaces actionable insights to help agent authors identify, diagnose, and resolve failure modes
  • Improved agent performance by leveraging the evaluation framework which contributed significant business value, with projections of recovering ~$3M in lost revenue
  • Enabled a key company objective to productionize 8+ agents by providing the framework for quality optimization