Senior Site Reliability Engineer

About the Role

Title: Senior Site Reliability Engineer

Location: Remote

Full time

job requisition id: R002901

Job Description:

About Ancestry:
When you join Ancestry, you join a human-centered company where every person’s story is important. Ancestry®, the global leader in family history, empowers journeys of personal discovery to enrich lives. With our unparalleled collection of more than 40 billion records, over 3 million subscribers and over 23 million people in our growing DNA network, customers can discover their family story and gain a new level of understanding about their lives. Over the past 40 years, we’ve built trusted relationships with millions of people who have chosen us as the platform for discovering, preserving and sharing the most important information about themselves and their families.

We are committed to our location flexible work approach, allowing you to choose to work in the nearest office, from your home, or a hybrid of both (subject to location restrictions and roles that are required to be in the office- see the full list of eligible US locations HERE). We will continue to hire and promote beyond the boundaries of our office locations, to enable broadened possibilities for employee diversity.

Together, we work every day to foster a work environment that’s inclusive as well as diverse, and where our people can be themselves. Every idea and perspective is valued so that our products and services reflect the global and diverse clients we serve.

Ancestry encourages applications from minorities, women, the disabled, protected veterans and all other qualified applicants. Passionate about dedicating your work to enriching people’s lives? Join the curious.

As a Senior Site Reliability Engineer (SRE) at Ancestry, you will play a critical role in enhancing the reliability, performance, and scalability of our services. Reporting to our Principal Software Engineering Manager, you will collaborate closely with our engineering teams to design, build, and instrument our web applications and systems infrastructure, with a strong focus on automation, availability, and performance. A deep understanding of system administration is essential, and specific experience with both Linux and Windows environments is required.

What you will do…

Own site reliability for a product vertical in collaboration with engineering

Define and Ensure SLO / SLI and Error budgets remain in compliance with standards
Develop improved monitoring, auto scaling and resiliency patterns and capabilities.
Debug complex issues across multiple services in AWS, to include outfacing infrastructure

Collaborate and Develop cloud automation and new best practices in support of vertical and organization
Train , mentoring and support in AWS, Infrastructure and Cloud best practices
Member of Site Reliability Engineering team which reports up to Site Reliability and Performance Organization

Who you are…

5+ years of experience in site reliability
3+ years software development experience
5+ years cloud automation experience using Go, Python, Bash
3+ years debugging Node.js, Java, and a variety of DB technologies
3+ years of experience working with AWS Cloud, including services, CLI, SDK’s, and AWS Console
5+ years using Cloud APM and logging tools, such as NewRelic, Prometheus, AWS monitoring

3+ years experience auto scaling, resilience, fault tolerance, AWS Infrastructure, cloud networking, and in containers management
3+ years experience analyzing production within a cloud environment
3+ years of Terraform or Cloud Formation experience for infrastructure management with CI/CD Pipeline

APPLY HERE

About the Role

Find Us

Search

About This Site