Site Reliability Engineer
Throne Solutions · Riyadh, Riyadh, Saudi Arabia
قدّم وتابع مع أبلاي إيدجJob Title: Site Reliability Engineer (SRE)Company: Throne SolutionsLocation: Riyadh, Saudi ArabiaEmployment Type: Full-TimeExperience Required: 5–8 YearsAbout Throne SolutionsThrone Solutions is seeking an experienced and motivated Site Reliability Engineer (SRE) to join our growing technology team in Riyadh. The ideal candidate will be responsible for ensuring the availability, scalability, performance, and reliability of enterprise production environments through automation, cloud-native technologies, proactive monitoring, and operational excellence. This role requires strong expertise in AWS cloud infrastructure, Kubernetes, CI/CD, and incident management within large-scale enterprise or Cisco environments.Role SummaryAs a Site Reliability Engineer, you will bridge software engineering and IT operations by designing resilient cloud infrastructure, automating operational processes, improving system reliability, and minimizing downtime. You will collaborate closely with development, infrastructure, and security teams to maintain highly available production systems while driving continuous improvement through automation and observability.Key ResponsibilitiesDesign, build, and maintain highly available, scalable, and secure AWS cloud infrastructure.Provision and manage cloud resources using Infrastructure as Code (IaC) tools such as Terraform and AWS CloudFormation.Deploy, administer, and optimize Kubernetes clusters and Docker-based containerized applications.Develop automation scripts using Python, Bash, or Go to eliminate manual operational tasks and improve efficiency.Design and maintain CI/CD pipelines using Jenkins, GitLab CI/CD, or similar DevOps platforms.Implement and manage monitoring, logging, and observability solutions using Prometheus, Grafana, Splunk, Datadog, CloudWatch, or equivalent tools.Monitor application health, infrastructure performance, and service availability to proactively detect and resolve issues.Lead incident response activities, perform Root Cause Analysis (RCA), and implement preventive measures to minimize recurring incidents.Manage Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Error Budgets to ensure service reliability.Participate in 24×7 on-call rotations and provide production support for critical systems.Optimize system performance, scalability, reliability, and Mean Time to Recovery (MTTR) through automation and continuous improvement initiatives.Develop and maintain operational runbooks, disaster recovery procedures, and technical documentation.Collaborate with DevOps, Development, Security, Infrastructure, and Network teams to support production deployments and operational readiness.Implement security best practices across cloud infrastructure, Kubernetes environments, and CI/CD pipelines.Ensure compliance with ITIL Incident, Problem, Change, and Release Management processes.Support enterprise production environments, including Cisco-based infrastructure where applicable.Required QualificationsBachelor's degree in Computer Science, Information Technology, Software Engineering, Computer Engineering, or a related discipline.5–8 years of professional experience in Site Reliability Engineering (SRE), DevOps, Cloud Engineering, or Production Support.Proven experience supporting mission-critical enterprise production environments.Mandatory Technical SkillsCloud PlatformsAmazon Web Services (AWS)EC2VPCIAMRDSS3ELBAuto ScalingRoute 53CloudWatchInfrastructure as Code (IaC)TerraformAWS CloudFormationContainerization & OrchestrationKubernetesDockerOperating SystemsLinux Administration (Red Hat, CentOS, Ubuntu)Programming & AutomationPythonBashGo (Preferred)CI/CD & DevOpsJenkinsGitLab CI/CDGitGitHubMonitoring & ObservabilityPrometheusGrafanaSplunkDatadogAWS CloudWatchIncident & ITSM ToolsServiceNowJiraITIL-based Service ManagementNetworking FundamentalsTCP/IPDNSHTTP/HTTPSLoad BalancingVPNFirewallsBasic Cisco NetworkingPreferred SkillsExperience working in Cisco enterprise environments.Knowledge of cloud security tools and security best practices.Experience with Infrastructure Monitoring and Application Performance Monitoring (APM).Familiarity with container security, Kubernetes security, and DevSecOps practices.Experience with Helm, ArgoCD, or GitOps methodologies.Understanding of microservices architecture and distributed systems.Exposure to multi-cloud or hybrid cloud environments is an advantage.Preferred CertificationsAWS Certified Solutions Architect – Associate or ProfessionalAWS Certified DevOps Engineer – ProfessionalCertified Kubernetes Administrator (CKA)Certified Kubernetes Application Developer (CKAD)HashiCorp Terraform AssociateRed Hat Certified System Administrator (RHCSA)ITIL Foundation CertificationKey Performance OutcomesMaintain high availability and reliability of production systems.Improve service uptime and overall platform resilience.Reduce Mean Time to Detect (MTTD) and Mean Time to Recovery (MTTR).Increase operational efficiency through automation.Enhance monitoring, observability, and incident response capabilities.Deliver scalable, secure, and cost-optimized cloud infrastructure.Ensure compliance with SLAs, SLOs, and operational best practices.Required CompetenciesStrong analytical and problem-solving abilities.Excellent troubleshooting skills in complex production environments.Strong communication and stakeholder management skills.Ability to perform effectively under pressure during critical incidents.Automation-first mindset with a passion for operational excellence.Excellent documentation and knowledge-sharing skills.Strong collaboration across development, infrastructure, networking, and security teams.Self-motivated, proactive, and committed to continuous learning.Why Join Throne Solutions?Opportunity to work on enterprise-scale cloud and infrastructure projects in Saudi Arabia.Exposure to cutting-edge AWS, Kubernetes, DevOps, and SRE technologies.Collaborative, innovation-driven, and high-performance work culture.Competitive compensation and professional development opportunities.Access to certification support, technical training, and career advancement.Work with modern cloud-native architectures, automation platforms, and enterprise production systems.