Machine Learning Engineer- Reinforcement Learning

Wave Recruitment · London Area, United Kingdom

ML Engineer - Reinforcement Learning London (hybrid, 1 day/week in Kings Cross)- Solve Data Centres Cooling issues Cooling is one of the largest items on a data centre's energy bill, and most sites run it conservatively because getting it wrong puts the hardware at risk. Our client trains reinforcement learning agents to control cooling systems on live sites, cutting cooling energy without breaching the temperature and humidity limits operators are contractually bound to.They're hiring an ML Engineer - Reinforcement Learning to build those agents and get them running on real data centres. You'll report to the CTO / Head of AI and work across the line between research and deployment.The SystemThe agents don't learn on the live plant. They train against a digital twin of each site, then move to production once they're safe.Reward and constraint design is shaped by ASHRAE standards and customer SLAs - air temperature, humidity, and rate-of-change limits on cooling air and chilled water setpointsTraining is federated across multiple sites. Agents share learned control strategies without any site's operational data leaving the building, which delivers significantly more savings than a single-site approachModels are deployed on-prem at the edge, then monitored and retrained in placeWhat You'll OwnReinforcement LearningTrain and deploy deep RL agents for live cooling controlDesign reward functions and constraints that hold up against physical limits and SLAs, not just in a notebookMove between research-style exploration and the engineering work to make something stable on a real siteSimulation and Digital TwinsBuild and improve the physics-based simulators, surrogate models, and digital twins the agents train againstClose the gap between what works in simulation and what holds on real hardwareProduction and DeploymentFederated and distributed training across sitesEdge deployment, monitoring, and retraining of agents already running in productionWhat We're Looking ForEssential3-5 years training and deploying deep RL agents in PythonPyTorch or JAX, and RL libraries such as GymnasiumA background in physical systems - engineering (mechanical, electrical, structural, biomedical), physics, robotics, autonomous driving, or control systems - and the instinct to reason about what's physically possible, not only what's mathematically possibleComfortable iterating between research exploration and the engineering needed to run on a live siteA degree in engineering, CS, or physicsUsefulControl systems (classical control, MPC), or HVAC, thermodynamics, power systems, or data centre operationsFederated learning, distributed training, or edge ML deploymentSimulation experience - building or using physics-based simulators, digital twins, surrogate models, or large physics modelsPublished research or open-source contributionsWho You AreYou want both halves of this job. You'll run experiments and read papers, but you also want your work controlling real equipment, with the constraints that come with that. RL experience limited to advertising or multi-armed bandits won't carry over here - the physical world doesn't behave like a recommendation system. A pure maths or CS background with no feel for physical systems will struggle, and so will anyone after a pure research seat or a pure production one.This sits in the middle.What's On Offer£110K-£150K, plus competitive equityA genuine technical problem: RL on physical systems, under real constraints, deployed on live infrastructureDirect access to the CTO and founding teamHybrid working, one day a week in the Kings Cross officeVisa sponsorship available on a case-by-case basisGet in touch for a confidential conversation. Imogen@waverecruitment.co.uk