Molecula is an Operational AI company that closes the gap between data and decision, enabling organizations to unlock the power of real-time analytics and AI. Our core technology, FeatureBase, is a feature-oriented database platform that powers real-time analytics and machine learning applications by simultaneously executing low latency, high throughput, and highly concurrent workloads. We are a burgeoning startup with a passionate team of dedicated engineers, marketers, and business experts determined to make a positive impact.
Molecula’s Engineering team is a group of brilliant makers and doers passionate about building world class products and solutions that make AI and ML possible for all. They take really challenging technical problems and turn them into elegantly simple yet incredibly complex solutions that delight our users. Most of all, they take pride in their craft and are a collaborative bunch that truly cares about the team, clients, company, and opportunity.
Molecula is looking for a Principal Site Reliability Engineer (SRE) to join our Engineering team. With your expertise in software and systems engineering, you will help us build and operate large-scale, distributed, fault-tolerant systems that will allow our users to push the boundaries on how data is accessed today. You will provide technical leadership across cross functional software, infrastructure, data, security, and product teams to ensure we deliver the most reliable and stable feature store ever.
- Participate in design of major software components, systems, and features to improve the reliability and availability, scalability, latency, and efficiency of Molecula’s services.
- Improve our infrastructure capabilities by guiding the definition of service level objectives for Molecula services.
- Provide guidance to other team members on customer deployments, monitoring, observability, and capacity planning
- Lead post-incident review and drive practices around blameless analysis, resolution, and continuous improvement work with cross-functional teams.
- Evangelize a culture of reliability and help mentor and train other team members on designing automation in order to meet service level objectives.
- Manage individual projects priorities, deadlines, and deliverables.
- Work closely with Product and Software developers to ensure that security and operations are considered throughout the entire development cycle.
- 10+ years of software development experience with at least 5 years focused in a DevOps or Site Reliability Engineer (SRE) role
- Experience with Go, Rust, C, or Python, or similar programming languages
- Experience with observability and automations for SaaS products
- Bachelor’s degree in Computer Science, similar technical field of study, or equivalent practical experience.
- Experience designing, analyzing, automating and troubleshooting large-scale distributed systems
- Experience in networking, security, hardware or OS performance tuning
- Experience with CI/CD pipelines such as GitLab, Azure DevOps, CircleCI to create end-to-end pipelines for all staging environments (Dev, Test, UAT, Production) is a plus.