Senior DevOps Engineer
Mirai, a Scopely company · Riyadh, Riyadh, Saudi Arabia
قدّم وتابع مع أبلاي إيدجThis role builds and runs the infrastructure our Generative AI products depend on: the pipelines that ship code, the platforms that run services and models, and the controls that keep all of it secure and reliable. AI workloads bring their own demands. GPUs, model serving, inference autoscaling, and token cost all shape the work, and you have run workloads like these before. You should be comfortable owning infrastructure as code, CI/CD, observability, and security on AWS, and ready to set the operational standards a growing team will lean on.What You Will DoWrite and maintain infrastructure as code so environments are reproducible, reviewable, and quick to recoverOwn CI/CD: the pipelines that build, test, scan, and deploy applications, agents, and model-serving servicesRun the container platform (EKS, ECS, or Fargate) and the deployment workflows on top of it, including GitOps where it fitsStand up the runtime for AI workloads: GPU capacity, model serving such as vLLM, Triton, or TGI, inference autoscaling, and the gateways and caching that sit in front of the modelsManage API gateways, networking, load balancing, DNS, and certificates so services are exposed safely and predictablyOwn secrets, identity, and least-privilege access across every environmentRun databases in production: clustering, replication, failover, backups, and recoveryBuild monitoring into everything, including token usage and GPU utilisation, with alerting and clear service objectivesLead reliability and security practice: incident response, policy as code, vulnerability and container scanning, and cost discipline, which matters once GPUs are in the mixRequirementsEight or more years in DevOps, SRE, or infrastructure engineering overall. That includes hands-on experience supporting AI or ML workloads in production, which can be a more recent part of your backgroun. Strong infrastructure as code with Terraform or OpenTofu, including module design and remote state. ExperienceStrong infrastructure as code with Terraform or OpenTofu, including module design and remote state. Experience with HCP Terraform (formerly Terraform Cloud) is a plusConfiguration management with AnsibleSolid AWS experience across compute, networking (VPC, subnets, security groups, load balancers, Route 53), IAM, and storageStrong CI/CD with GitHub Actions, including reusable workflows and careful handling of credentialsContainers and orchestration: Docker with Kubernetes (EKS preferred), Helm, and a registry such as ECRAPI gateway experience with Kong or Amazon API Gateway, including auth, rate limiting, and routingDatabase operations including clustering and high availability, with RDS or Aurora, PostgreSQL, and a cache such as Redis or ElastiCacheSecrets management with HashiCorp Vault, AWS Secrets Manager, or Parameter StoreObservability with Prometheus, Grafana, CloudWatch, and OpenTelemetry, or close equivalentsComfort in Linux and scripting with Bash and Python