Job Description
Senior Data Engineer, Big Data
The GitLab DevOps platform empowers 100,000+ organizations to deliver software faster and more efficiently. We are one of the world’s largest all-remote companies with 1,600+ team members and values that guide a culture where people embrace the belief that everyone can contribute.
GitLab is looking to hire a Senior Data Engineer, Big Data, to their team!
This role requires an analytical and business-oriented mindset with the ability to implement rigorous database solutions and best practices in order to produce and influence the adoption of strong quality data insights to drive business decisions in all areas of GitLab. Data Engineers are essentially software engineers who have a particular focus on data movement and orchestration.
Location – This position is 100% remote
Don’t have a ton of knowledge about GitLab yet? Don’t worry. We have an extensive onboarding and training program at GitLab and you will be provided with necessary DevOps and GitLab knowledge to fulfill your role.
What you’ll do in this role…
- Maintain our data warehouse with timely and quality data
- Build and maintain data pipelines from internal databases and SaaS applications
- Create and maintain architecture and systems documentation
- Write maintainable, performant code
- Implement the DataOps philosophy in everything you do
- Plan and execute system expansion as needed to support the company’s growth and analytic needs
- Collaborate with Data Analysts to drive efficiencies for their work
- Collaborate with other functions to ensure data needs are addressed
- This position is always central and reports to the Manager, Data
- Understand and implement data engineering best practices
- Improve, manage, and teach standards for code maintainability and performance in code submitted and reviewed
- Create smaller merge requests and issues by collaborating with stakeholders to reduce scope and focus on iteration
- Ship medium to large features independently
- Generate architecture recommendations and the ability to implement them
- Great communication: Regularly achieve consensus amongst teams
- Perform technical interviews
We’re looking for…
- 5+ years hands-on experience deploying production quality code
- Professional experience using Python, Java, or Scala for data processing (Python preferred)
- Knowledge of and experience with data-related Python packages
- Demonstrably deep understanding of SQL and analytical data warehouses (Snowflake preferred)
- Hands-on experience implementing ETL (or ELT) best practices at scale.
- Hands-on experience with data pipeline tools (Airflow, Luigi, Azkaban, dbt)
- Expertise in designing and developing distributed data pipelines using big data technologies on large scale data sets demonstrated with years of proven experience.
- Has experience with various streaming data concepts, such as Kafka.
- Can lead full scale Data Lake implementations.
- Has good understanding of Lambda Architecture.
- Strong data modeling skills and familiarity with the Kimball methodology.
- Understand and implement data engineering best practices
- Experience with Salesforce, Zuora, Zendesk and Marketo as data sources and consuming data from SaaS application APIs.
- Share and work in accordance with our values
- Constantly improve product quality, security, and performance
- Desire to continually keep up with advancements in data engineering practices
- Catch bugs and style issues in code reviews
- Ship small features independently
- Ability to use GitLab
Also, we know it’s tough, but please try to avoid the confidence gap. You don’t have to match all the listed requirements exactly to be considered for this role.