About the Role
Title: Senior Site Reliability Engineer
Location: Remote – USA
About GlossGenius
GlossGenius is building an ecosystem enabling entrepreneurs to succeed. We empower small business owners to focus on being creators, not admins, by offering a range of business management tools including booking and scheduling, marketing, analytics, payment processing and much more.
Over 70,000 small business owners have chosen to rely on GlossGenius every day to run their entire set of business operations. Joining its powerful, intuitive platform with its vibrant, distinguished brand, GlossGenius is the ideal combination of a fintech, SMB software, and consumer company all in one.
About the Role
In this role, you’ll have the opportunity to join GlossGenius as one of the first Senior Site Reliability Engineer as part of the Platform Engineering team. Platform Engineering is the backbone of our technical infrastructure at GlossGenius, with a dual focus on elevating the developer experience and ensuring the reliability of our production environment. In essence, Platform Engineering is about creating an environment where developers thrive, armed with powerful tools, while also ensuring the robustness and scalability of the infrastructure that underpins our digital ventures.
As a Site Reliability Engineer, you will play a key role in maintaining reliable, secure, scalable, and highly available infrastructure and applications that empower over 70,000 Service Professionals to run their businesses. You will drive operational excellence while scaling our AWS footprint and fostering close collaboration with product and engineering teams.
You will report to the Senior Engineering Manager, Platform, and can be based remotely anywhere in the United States, Canada, or hybrid in our NYC Office.
What Youll Do
- Working with Product and Engineering peers to support an infrastructure platform that is reliable, scalable, secure and reduces manual toil
- Help GlossGenius scale its AWS cloud footprint, contributing to the technical direction
- Build tools to help engineers quickly identify problems, wherever they occur in the stack
- Drive and shape incident management practices across engineering
- Improve and augment the monitoring and alerting platform, and act as an SME for other teams wishing to have better visibility into their services
- Spread SRE culture throughout GlossGenius
- Understand industry and company-wide trends to help assess and develop new technologies
- Collaborate with the broader engineering team to ensure optimal application performance and scalability to build highly resilient systems
- Own problems from end to end, managing complexity, and engaging directly with stakeholders to think through everything from business impact to reliability and operability, to security; always approaching situations with a bias to action
What Were Looking For
- 4+ years of experience working with cloud technologies in Production Engineer, Cloud Engineer, Site Reliability Engineer, or DevOps equivalent roles
- Demonstrated experience working with cloud platforms (AWS, GCP, Azure, etc), having designed, built, and maintained cloud platforms that run production-grade services and traffic
- Demonstrated experience with infrastructure-as-code principles and development
- Knowledge of IP networking, DNS, CDN, load balancing, HTTP, and firewalls
- Experience building and maintaining cloud-first monitoring, logging, and alerting infrastructure that supports 24/7 enterprise platforms
- Participating in on-call rotations
- Experience with container technology using Docker and Kubernetes
- The ability to write high-quality code in a high-level programming language (e.g. Typescript, Ruby, Python)
- Experience executing projects from start to finish and are outcome-oriented