If you’re the type who can smell a race condition from three commits away, this is a “make the AI better by being annoyingly correct” role. You’ll pressure-test advanced language models with real C# engineering work, evaluate code quality like a senior dev, and document exactly where the model’s reasoning falls apart so it can be fixed.
About Invisible Agency
Invisible supports companies by building scalable, high-quality operations. This project focuses on improving AI model performance using expert-led training data and rigorous evaluation, especially in technical domains.
Schedule
Remote (United States).
Contract / freelance.
Hours vary based on your availability (you’ll report average weekly hours).
You provide your own secure computer and high-speed internet. No company benefits (contractor role).
What You’ll Do
Challenge AI models with software engineering tasks and technical scenarios implemented in C#
Evaluate outputs for correctness, performance, security, clarity, and maintainability (not just “it compiles”)
Capture reproducible error traces and failure modes (bad assumptions, broken async, leaky abstractions, unsafe patterns, etc.)
Suggest improvements to prompts, evaluation metrics, and quality standards for C# and general engineering reasoning
What You Need
Strong real-world C# experience (async/await, LINQ, OOP, collections, concurrency, testing, debugging)
Comfort with common C# ecosystem patterns (APIs, services, data access, DI, config, logging, observability)
Ability to communicate clearly and show your work when diagnosing issues and proposing fixes
Bachelor’s/master’s/PhD in CS/SE (nice), or equivalent hands-on experience
Bonus signals: open-source work, technical writing, production systems experience, cloud deployments
Benefits
Pay range: $8–$65/hour depending on experience, expertise, and location
Remote, flexible contract work
Direct impact: your work improves how AI handles real engineering problems, not toy examples
One straight-up warning: a lot of candidates say “C# expert” but haven’t shipped anything beyond tutorials. If you’ve actually owned production services, handled incidents, and can talk tradeoffs (latency vs. throughput, memory vs. allocations, async pitfalls), you’ll stand out fast and you should price yourself like it.
Happy Hunting,
~Two Chicks…