This is a high-impact, staff-level role building agentic AI for real-world investigations. You’ll be an early architect for Night Shift, Flock’s AI research assistant, and you’ll own the evaluation framework that proves the system is faster, safer, and more accurate in production. If you love making complex AI measurable, dependable, and fast, this is your lane.
About Flock Safety
Flock Safety is a safety technology platform helping communities take a proactive approach to crime prevention and security. Their hardware and software connect cities, law enforcement, schools, businesses, and neighborhoods into a nationwide public-private safety network, with an emphasis on privacy and responsible innovation. They’re scaling quickly and building systems where reliability and trust matter as much as innovation.
Schedule
Full-time, remote (USA)
Department: Engineering (Machine Learning team)
Cross-functional work with Backend, Frontend, and Design in a fast-paced environment
Remote-friendly, with priority for candidates in key hubs: Atlanta, Boston, Chicago, Denver, Los Angeles, New York City, San Francisco, and Austin
CJIS certification required if hired (fingerprint-based background check required)
What You’ll Do
⦁ Help design the system architecture for agentic AI powering Night Shift, an AI research assistant for investigators
⦁ Own and build the AI evaluation framework that becomes the source of truth for quality, safety, and performance
⦁ Stand up foundational eval and observability scaffolding (datasets, metrics, KPIs, dashboards, reporting)
⦁ Create offline and online evaluation harnesses to enable debugging, regression testing, and PR-gated quality checks
⦁ Build and refine agent tooling patterns: tool use, retrieval, memory, grounding and attribution, and guardrails
⦁ Balance cost, latency, and quality trade-offs while shipping real improvements against measurable metrics
⦁ Partner tightly with product and engineering teams to ship quick wins (tool APIs, prompts, bug fixes) and then scale
⦁ Productionize tracing, alerting, and monitoring so quality and safety are continuously measurable in the wild
⦁ Lead deeper R&D threads (as needed) to improve system performance, including embeddings and multimodal understanding
What You Need
⦁ Hands-on experience building LLM agent systems (tool use, retrieval, memory, grounding, guardrails)
⦁ Familiarity with modern LLM stacks and APIs (LangChain/LangGraph, vLLM, OpenAI/Gemini/Anthropic APIs)
⦁ Strong grasp of multi-agent patterns (planning, hand-offs, context management)
⦁ RAG experience with vector and hybrid search, plus rerankers and retrieval tuning (examples include pgvector and similar stacks)
⦁ 5+ years building and shipping ML systems to production
⦁ Backend fluency in Python and JavaScript (TypeScript and/or Golang welcome)
⦁ Experience building web services (FastAPI/Express, REST, SSE, JWTs)
⦁ Cloud infrastructure experience (AWS, Terraform, VPC, networking)
⦁ Experience with backend stores (Postgres, Redis) and strong observability instincts
⦁ Experience building evals at scale, including offline and online harnesses measuring:
⦁ search, retrieval, and recommendation quality
⦁ safety and robustness (security, compliance, red-teaming, regression testing)
⦁ cost, performance, and latency trade-offs
⦁ Preferred but not required:
⦁ durable execution (Temporal, Hatchet)
⦁ OLAP systems (ClickHouse, BigQuery)
⦁ ML inference (PyTorch, Triton, TensorRT) and multimodal domains (text, image, video)
⦁ Compute orchestration (Kubernetes, Prefect, Ray)
⦁ Agentic eval depth (task success, trajectory quality, preference learning like SFT/DPO/RLHF, LLM-as-judge)
Benefits
⦁ Salary range: $200,000–$225,000
⦁ Equity (stock options)
⦁ Flexible PTO (non-accrual) plus 11 company holidays
⦁ Fully paid health benefits (medical, dental, vision) with HSA match
⦁ 12 weeks paid parental leave (plus additional recovery time for birthing parents)
⦁ Fertility and family benefits (up to $50,000 lifetime maximum for eligible adoption, surrogacy, or fertility expenses)
⦁ Mental health benefits (therapy, coaching, medication support, and digital tools)
⦁ Caregiver support resources
⦁ Work-from-home stipend: $150/month
⦁ Productivity stipend: $300/year
⦁ Home office stipend: one-time $750
⦁ Employee resource groups for community and support
This is one of those roles where “cool AI” is not enough. They want AI that holds up under pressure, with receipts. If a candidate can’t talk evals, tracing, and safety like second nature, they’ll struggle here.
Happy Hunting,
~Two Chicks…