Build the guardrails that keep AI Agents honest. This role is about making sure Veeva’s AI behaves reliably in the real world, not just in a demo, through rigorous evaluation, testing, and validation.

About Veeva Systems
Veeva is a mission-driven industry cloud company helping life sciences teams bring therapies to patients faster. They’re a public benefit corporation focused on doing the right thing for customers, employees, and communities, while scaling one of the fastest-growing SaaS platforms in the space.

Schedule
Remote work available (Work Anywhere) within the United States or Canada
Full-time role aligned to your product team’s time zone (predictable core hours for collaboration)
No visa sponsorship available at this time

What You’ll Do

  • Define evaluation strategies for new AI Agents, including test coverage based on real-world failure modes
  • Evaluate LLM outputs for accuracy, relevance, coherence, and safety using both programmatic and manual methods
  • Build high-fidelity datasets, including adversarial prompts, to expose bias, unsafe content, hallucinations, and edge cases
  • Create and maintain automated evaluation pipelines to continuously validate agent behavior and prevent regressions
  • Support root-cause analysis by tracing agent behaviors, tool use, and action sequences to find where things break
  • Report performance metrics, validation results, and bug status clearly to engineering and product teams

What You Need

  • Strong data quality and validation expertise, including bias and integrity testing for datasets
  • Advanced prompt engineering skills and deep understanding of common LLM failure modes (hallucination, incoherence, jailbreaking)
  • 5+ years designing and deploying automated evaluation pipelines for complex AI or agentic systems
  • Strong Python experience (5+ years) building evaluation frameworks, scripts, and CI/CD integrations
  • Familiarity with test automation tools (e.g., Pytest and modern automation tooling)
  • Bachelor’s degree in Data Science, ML, CS, or related field, with GenAI/LLM experience
  • Unrestricted right to work in the United States or Canada (no sponsorship)
  • High integrity and high work ethic (this is a quality-first, high-accountability team)

Benefits

  • Medical, dental, vision, and basic life insurance
  • Flexible PTO and company paid holidays
  • Retirement programs
  • 1% charitable giving program
  • Base pay range: $85,000–$225,000 (varies by experience and location; may include bonus/stock)

These roles get scooped fast. If you’ve got the brain for breaking systems before users do, move on it.

Happy Hunting,
~Two Chicks…

APPLY HERE