AI Evaluation Engineer

Company: DeepRec.ai

Location: denver

Closing Date: 22/06/2026

Salary: $180,000 per annum

Hours: Full Time

Type: Permanent

Apply Now

Job Description

US Recruitment Consultant: Guiding GenAI professionals towards their dream careers

AI Evaluation Engineer
$180,000
Remote (US-based)

Are you passionate about shaping how AI is deployed safely, reliably, and at scale? This is a rare opportunity to join a mission‑driven tech company as their first AI Evaluation Engineer, a foundational role where you’ll design, build, and own the evaluation systems that safeguard every AI‑powered feature before it reaches the real world.

This organization builds AI‑enabled products that directly helps governments, nonprofits, and agencies deliver financial support to people who need it most. As AI capabilities race forward, ensuring these systems are safe, accurate, and resilient is critical. That’s where you come in.

You won’t just be testing models, you’ll be creating the frameworks, pipelines, and guardrails that make advanced LLM features safe to ship. You’ll collaborate with engineers, PMs, and AI safety experts to stress test boundaries, uncover weaknesses, and design scalable evaluation systems that protect end users while enabling rapid innovation.

What You’ll Do

Own the evaluation stack – design frameworks that define “good,” “risky,” and “catastrophic” outputs.
Automate at scale – build data pipelines, LLM judges, and integrate with CI to block unsafe releases.
Stress testing – red team AI systems with challenge prompts to expose brittleness, bias, or jailbreaks.
Track and monitor – establish model/prompt versioning, build observability, and create incident response playbooks.
Empower others – deliver tooling, APIs, and dashboards that put eval into every engineer’s workflow.

Requirements

Strong software engineering background (TypeScript a plus)
Deep experience with OpenAI API or similar LLM ecosystems
Practical knowledge of prompting, function calling, and eval techniques (e.g. LLM grading, moderation APIs)
Familiarity with statistical analysis and validating data quality/performance
Bonus: experience with observability, monitoring, or data science tooling

Seniority level

Not Applicable

Employment type

Full-time

Job function

Information Technology
Technology, Information and Media, Information Services, and Software Development

#J-18808-Ljbffr

Apply Now

Share this job

DeepRec.ai

Useful Links

More Jobs in denver
Full Time Jobs in denver
Part Time Jobs in denver
Engineering Jobs

Similar Jobs
AI Evaluation Engineer (Remote)
Denver
View Job
AI Evaluation Engineer (Remote) (Hiring Immediately)
Denver
View Job
AI Integration Engineer (Remote)
Denver
View Job
AI Integration Engineer (Remote) (Hiring Immediately)
Denver
View Job
AI Platform Engineer (Remote) (Hiring Immediately)
Denver
View Job