About the Role

Data Scientist II

Title: Data Scientist II

Location: Seattle United States

Job Description:

What We Do: The Data Science Team focuses on building models that leverage Payscale’s five core compensation datasets to provide accurate, high coverage estimates of compensation ranges for jobs across industries and the globe. We use modeling techniques such as Bayesian statistics  (regression, hierarchical modeling, and transfer learning),  deep learning (NLP, LLMs,  and embeddings), and recommendation systems to model how different jobs are related to each other in order to produce good compensation range predictions. We do a mix of development on well-defined projects and greenfield innovation. We build internal tools (APIs and interactive demos (using e.g. Streamlit) to exhibit our work. We value teamwork, learning together, maintainability, documentation, and giving clear presentations about our work to non-ML stakeholders. We are generalist problem solvers—we use (or learn!) the best tool for the problem.

Our team works closely with compensation domain experts to help define the problems, identify and validate our assumptions, and evaluate our predictions. We are supported by a separate Data Engineering Team that helps turn our models into production APIs for use in products across Payscale’s portfolio.

What You Do: You will be designing and building machine learning models, implemented in production-grade Python code, that provide compensation estimates in low-data scenarios and quantify the impact that skills and other compensable factors have on pay across jobs, industries, and locations. You’ll interface with domain experts, software engineers, designers and product managers on a regular basis. You’ll give periodic presentations about your models/findings to a technical, but non-ML-trained audience. Our codebase is in Python and runs on cloud infrastructure.

Day-in-the-Life:  

As a Data Scientist II, a typical day may include the following: 

  •        Designing and building a new model  
  •        Implementing your model in production-grade readable, maintainable, and extensible Python, with version control and code reviews    
  •        Meeting with domain experts to get feedback on models  
  •        Documenting findings, identifying promising avenues for model improvement  
  •        Partnering with the Data Engineering Team to drive productionization of models  
  •        Participating in Data Science Book Club (bi-weekly open-invite learning workshop, currently focused on state-of-the-art NLP techniques)  
  •        Mentoring Data Analysts on analysis and visualization techniques (e.g. stats, regression)  
  •        Participating in team code reviews 

First Year in Role:  

By your third month, you’ll know how to access our datasets and code. You’ll be able to run the models that we are currently developing, and you’ll be contributing code to support these models—e.g. a new component for the evaluation suite, an internal head-to-head comparison of model results, or the application of one of our models to a particular domain. 

By your sixth month, you have completely ramped up on the team and you’ll own your own workstream: meeting with stakeholders, designing and scoping solutions, building, and presenting your findings and progress. 

Qualifications: 

  •        3+ years building and maintaining machine learning models with production-  grade model training/retraining   
  •        Fluency in Python – object-oriented programming, pandas, scikit learn  
  •        Comfort with version control and environment management   
  •        Experience with one or more ML platforms, such as AWS Sagemaker or Azure ML in production  
  •        Proficiency in basic SQL (joins, grouping, ordering, views)  
  •        Experience using multiple supervised and unsupervised learning techniques with ability to compare approaches and results  
  •        Able to clearly articulate technical concepts to developers, managers, and less technical colleagues   

Tools: 

  •        We code everyday using Python (Pandas, NumPy, SciPy, Scikit Learn, PyTorch) 
  •        For accessing our data we use Snowflake and SQL Server. 
  •      We currently productionize with Docker, Kubernetes, AWS, Azure, Team-City, and Octopus, but we’re always experimenting with new tools

APPLY HERE