Senior Site Reliability Engineer

Berkley Hunt · New York, NY

Applicants must be authorized to work in the U.S. without visa sponsorshipOverviewA high-growth fintech company is seeking a Site Reliability Engineer to help scale and operate a globally distributed, highly available cloud platform. This hybrid role in Manhattan focuses on reliability, automation, Kubernetes, and cloud infrastructure across AWS and GCP.What You’ll DoDesign and maintain scalable cloud infrastructure using Infrastructure-as-Code (Terraform)Manage and optimize Kubernetes environments using Helm and ArgoCDImprove deployment workflows through GitOps and automationImplement monitoring, logging, and alerting using tools such as Splunk and GrafanaSupport production systems, lead incident response, and participate in a 24/7 on-call rotationDevelop automation tooling using Python and/or GoPartner with engineering teams to improve scalability, resiliency, and operational excellenceTroubleshoot Linux systems and networking issues across distributed environmentsWhat We’re Looking ForStrong experience with Kubernetes in production environmentsHands-on experience with AWS and GCPExpertise in Terraform; experience with Ansible and Terragrunt is a plusStrong Linux administration and troubleshooting skillsExperience with observability, automation, and cloud-native architecturesSolid networking knowledge and problem-solving skillsStrong communication skills and ability to work in fast-paced Agile teamsCompensation & BenefitsSalary: $140,000 – $170,000 + 20% bonus (Achieved above 90% of the company’s active years)