Job Description

Title: Staff Machine Learning Engineer

Location: Remote


Who we are

Our mission at Sourcegraph is to make it so that everyone can code, not just ~0.1% of the population. Our code graph powers Cody, the most powerful and accurate AI coding assistant, as well as our Code Search product, which helps devs explore their entire codebase and make large-scale migrations and security fixes. We’re building software that builds software, and in doing so we’re making devs more productive and preparing for a world where a lot more code gets written.

It’s an exciting time to join Sourcegraph. AI has taken over the world, and we’ve spent the last 10 years building infrastructure that’s integral to making AI generated code more powerful and accurate. Our customers include 4/5 FAANG companies, 4 of the top 10 banks, government organizations, Uber, Plaid, and many other companies building the software that pushes the world forward. We’ve raised $225M at a $2.625B valuation from Andreessen Horowitz, Sequoia, Redpoint, Craft and others. We’re making ambitious bets on our future and we’re looking to hire exceptional people to join our team as we make Sourcegraph one of the biggest and most influential companies in the world.

Working hours

Given that we are an all-remote company and hire almost anywhere in the world, we don’t have a particular time-zone preference for this role. However, you may need to be available for non-recurring urgent meetings outside of working hours.

Why this job is exciting

We are creating a machine learning team at Sourcegraph, aimed at creating the most powerful coding assistant in the world. Many companies are trying, but Sourcegraph is uniquely differentiated by our rich code intelligence data and powerful code search platform. In the world of prompting LLMs, context is everything, and Sourcegraph’s context is simply the best you can get: IDE-quality, global-scale, and served lightning fast. Our code intelligence, married with modern AI, is already providing a remarkable alpha experience, and you can help us unlock its full potential.

We are looking for an experienced full stack ML engineer with demonstrated industry experience in productionizing large scale ML models in industrial settings. And if you happen to have an entrepreneurial streak, you’re in luck: We have an enterprise distribution pipeline, so whatever you build can be deployed straight to enterprise customers with some of the largest code bases in the world, without all the go-to-market hassle you’d encounter in a startup.

You will be a scientist at Sourcegraph Labs doing R&D, and pushing the boundaries of what AI can do, as an IC on our new ML team. You will have the full power of Sourcegraph’s Code Intelligence Platform at your disposal, and you’ll be working on a coding assistant that is already awesome even after just a few weeks of work, so this is a greenfield opportunity to multiply dev productivity to unprecedented levels.

Within one month, you will

  • Start building a trusting relationship with your peers, and learning the company structure.
  • Be set up to do local development, and be actively prototyping, and dive deep into how AI and ML is already used at Sourcegraph and identify ways to improve moving forward.
  • Start informing the design of ML infra/platform to support deploying of large scale ML models
  • Develop simulated datasets using Gym style frameworks across a number of Cody use cases.
  • Experiment with changes to Cody prompts, context sources and evaluate the changes with offline experimentation datasets.
  • Ship a substantial new feature to end users.

Within three months, you will

  • Be seen as a subject matter expert in all things ML at Sourcegraph.
  • Be developing and executing the overall ML infra strategy to maximize impact of ML for Cody users.
  • Be ensuring the adoption of best practices in machine learning model development and experimentation.
  • Building out feature computation, storage, monitoring, analysis and serving systems for features required across our Cody LLM stack
  • Developing distributed training & experiment infrastructure over Code AI datasets, and scaling distributed backend services to reliably support high-QPS low latency use cases.
  • Be following all the relevant research, and conducting research of your own.

Within six months, you will

  • Be driving the technical vision and owning the overall modeling and ML infra roadmap for context ranking and LLM inference for Cody
  • Be fully ramped up and owning key pieces of the assistant, and other relevant parts of the Sourcegraph product.
  • Be helping design and build what might become the biggest dev accelerator in 20 years.

About you

You are an experienced full stack ML engineer with demonstrated industry experience in formulating ML solutions, developing end to end data orchestration pipelines, deploying large scale ML models and experimenting offline and online to drive business impact for Cody users. You want to be part of a world-class team to push the boundaries of AI, with a particular focus on leveraging Sourcegraph’s code intelligence to leapfrog competitors.

First, your AI background could look like a few different things:

  • Demonstrated ability to design, build and scale ML training and inference services.
  • Developing and maintaining distributed training, inference and experimentation infrastructure
  • Experience with taking advantage of state of the art accelerators (TPUs, GPUs) to scale distributed training jobs
  • Experience with ML orchestration using Airflow, Flyte or similar frameworks
  • Developing a high-throughput inference engine providing low latency performance using a mix of CPU and GPU hardware
  • Building core data and model metadata systems powering the end-to-end ML lifecycle, and advancing the usage of ML monitoring and observability
  • You have some hands-on experience working with large foundational models and their toolkits. Familiarity with LLMs such as Llama, StarCoder etc., model fine-tuning techniques (LORA, QLORA), prompting techniques (Chain of Thought, ReACT, etc) and model evaluation.

Second, you have some understanding of programming languages, and tools that manipulate code. This could have taken any number of forms; e.g.:

  • You’ve worked with grammars and parser generators, or Treesitter
  • You’ve worked with compilers and semantic analysis, e.g. type systems
  • You’ve written an interpreter, or worked on a virtual machine
  • You’ve done static analysis involving scanning source code for semantic information

It doesn’t really matter how you know it, but it’s important that you’re familiar with the basic concepts of semantic representations of source code, and how they’re produced and consumed by tooling.

Preferred qualifications:

  • 6+ years of industry experience with a solid understanding of engineering, infrastructure and ML best practices
  • Industry experience building end-to-end ML infrastructure is mandatory
  • Hands-on experience training and serving large-scale (10GB+) models using frameworks such as Tensorflow or PyTorch
  • Experience with Docker, Kubernetes, Kubeflow, knowledge of CI/CD in the context of ML pipelines is a plus
  • Experience with CUDA, model compilers, and other model-specific optimizations.
  • Experience with LLM inference latency optimization techniques, e.g. kernel fusion, quantization, dynamic batching is a big plus