Site Reliability Engineer – Remote | Work At Home Tech Jobs

This is an SRE role for someone who can keep Kubernetes-based systems stable, fast, and predictable, even when everything is on fire. You’ll own reliability through IaC, observability, CI/CD, and sharp incident response, with a bonus focus on automation that can leverage AI/ML.

About Axiom Software Solutions Limited
Axiom Software Solutions Limited supports organizations with specialized engineering talent across cloud and platform operations. This contract role is focused on improving reliability and performance in modern containerized environments, partnering closely with development teams.

Schedule
• Remote (United States)
• Contract position
• SRE-style responsibilities including incident response and root cause analysis
• Cross-functional collaboration with dev and platform teams

What You’ll Do
• Design, deploy, and manage Kubernetes environments end to end (config, monitoring, troubleshooting)
• Build scalable infrastructure using infrastructure-as-code principles
• Create strong monitoring and alerting strategies to catch issues before customers do
• Identify performance bottlenecks and implement improvements across systems and apps
• Implement and maintain CI/CD pipelines to support safe, repeatable releases
• Lead incident response, run root cause analysis, and ship preventative fixes
• Build automation tools, including AI/ML enhancements where it makes sense
• Partner with developers to improve reliability, performance, and operational readiness

What You Need
• 5–7 years in SRE and/or DevOps roles
• Strong Kubernetes expertise and container orchestration depth
• Deep Linux/Unix knowledge plus performance tools (NMON and similar)
• Experience with logs, monitoring, and observability tooling
• Database administration and performance tuning experience (Oracle, SQL Server)
• Strong programming in one of: Python, Go, Java, or Node.js
• Experience building automation tools and frameworks
• Proven habit of proactive issue detection and resolution

Preferred
• AI/ML integration into ops workflows
• Cloud experience (AWS, GCP, Azure)
• Service mesh familiarity
• Distributed systems architecture understanding
• Security best practices and compliance awareness

Benefits
• Not listed (contract role: confirm rate, duration, overtime/on-call expectations, and equipment access)

Real talk: the title includes “Ex – Fidelity Exp,” which usually means the end client is Fidelity or they’re filtering for people who’ve worked in that ecosystem. If you’ve got that, highlight it. If you don’t, compensate by showing regulated-industry experience (finance, healthcare, insurance) plus strong incident ownership.

Happy Hunting,
~Two Chicks…

Site Reliability Engineer – Remote

APPLY HERE

Find Us

Search

About This Site