Senior DevOps Engineer
WENdiversity · Bengaluru, Karnataka, India
Apply & track with Apply EdgeRole & ResponsibilitiesYou take end-to-end ownership of infrastructure, design, scale, and operate it. This goes beyond execution. Here's what that looks like day to day:Own the design, architecture, and reliability of Locus's cloud infrastructure across AWS, Azure, GCP, and Aliyun, supporting multi-region, global deployments.Lead the evolution of our CI/CD ecosystem, optimize and refactor our Jenkins-as-Code setup for scalability, performance, and developer efficiency.Drive the Infrastructure as Code (IaC) journey end-to-end, migrate existing cloud resources, alarms, and configurations fully into code with strong versioning, review, and rollback practices.Partner with engineering teams to identify and resolve performance, scalability, and reliability bottlenecks, deep dives into memory, CPU, networking, and storage constraints.Define and implement monitoring, alerting, and incident response best practices, improve MTTR, system observability, and operational readiness.Lead initiatives around cost optimization, security hardening, and capacity planning, keep infrastructure efficient and compliant as the platform scales.Act as a technical mentor for junior DevOps engineers and raise the overall DevOps maturity across teams.Ideal CandidateMust have 5+ years in DevOps / SRE / Infrastructure roles with hands-on experience (clear scale signals like traffic, uptime, latency, infra size should be mentioned)Must have B2B SaaS company experience with multi-tenant architecture OR multiple production stacks (multi-env / multi-client systems)Tech Skills - Cloud & Infra: AWS (VPC, EKS, EC2, RDS, networking), Kubernetes (EKS) at scale, Designing high availability, multi-region systemsTech Skills - Automation & IaC: Terraform (must-have), Helm / GitOps, Strong scripting (Python / Go / Bash)Tech Skills - CI/CD & Release: Scalable CI/CD pipelines (GitHub Actions / Jenkins), Zero/low downtime deploymentsTech Skills - Reliability & Observability: SRE principles (SLOs, SLIs, error budgets), Monitoring tools (Prometheus, Grafana, Datadog), Alerting, on-call, incident managementBTech in Computer Science or related fields Strong B2B SaaS product companies only (good scaled)Skills: aws,ci,devops,cd