Lead RL Sub‑Optimality Engineer

corpora-jobs-1778796293285-db9d41c6 - Frontier Development

Research EngineerLead1 position

⚡

Why This Role is Different

Frontier Development Role

You will pioneer a provably ε‑optimal joint‑policy engine that marries advanced RL theory with real‑time distributed execution. Your work will ensure that our agents never drift beyond a safety‑guaranteed performance envelope, even under noisy or adversarial conditions.

The Frontier Element

Combining sample‑complexity‑optimal RL with distributed sub‑optimality bounds and trust‑driven re‑optimization is a novel, uncharted territory that pushes the limits of safety‑critical AI.

🔬

Project Context

Research Area

Joint Policy Re‑Optimization with Sub‑Optimality Bounds (JPRO‑SOB)

From: Cascading Misinterpretation and Suboptimal Joint Actions

Why This Role is Critical

JPRO‑SOB guarantees bounded sub‑optimality, a prerequisite for safety‑critical multi‑agent coordination and the only way to prevent catastrophic divergence.

What You Will Build

A distributed, bounded‑approximation RL engine that periodically re‑optimizes joint policies, integrates regret decomposition, and triggers re‑optimization when DTSP trust scores fall below a threshold.

🛠

Key Responsibilities

Design and prove a bounded‑approximation algorithm for joint policy re‑optimization with ε‑optimality guarantees.
Implement distributed RL backbones (MAPPO, MADDPG) with regret‑decomposition integration.
Create a trust‑driven trigger that initiates re‑optimization when DTSP scores dip below a threshold.
Optimize the algorithm for heterogeneous edge devices and limited communication bandwidth.
Benchmark against classical controllers and publish results in top conferences.

🎯

Required Skills & Experience

Technical Must-Haves

Multi‑Agent Reinforcement Learning

Expert

Core to building joint policy re‑optimization.

Regret Analysis & Sub‑Optimality Bounds

Expert

For theoretical guarantees.

Distributed Optimization & Convex Analysis

Advanced

To design efficient, scalable algorithms.

High‑Performance Computing (CUDA, MPI)

Advanced

For real‑time inference on edge and cloud.

PyTorch / TensorFlow

Expert

Primary frameworks for RL implementation.

Real‑Time Distributed Control (DMPC)

Proficient

To integrate with communication constraints.

Experience Requirements

7+ years in RL research focused on multi‑agent systems.
Published work on sub‑optimality bounds, regret decomposition, or distributed RL.
Experience deploying distributed RL in production environments.

Education

PhD in Machine Learning, Robotics, or a related field with a focus on reinforcement learning.

⭐

Preferred Skills

Experience with DMPC or real‑time distributed control.
Knowledge of HPC clusters and GPU‑accelerated training.
Background in safety‑critical autonomous systems.

🤝

You Will Thrive Here If...

Action‑oriented, loves shipping systems that work in the wild.
Thrives in fast‑moving, high‑impact environments.
Enjoys rigorous theoretical work coupled with rapid prototyping.

📈

Impact & Growth

12-Month Impact

Deliver a policy re‑optimization module that guarantees ε‑optimality, improving coordination efficiency by 20% in real‑world multi‑agent scenarios and reducing catastrophic divergence incidents to near zero within 12 months.

Growth Opportunity

Lead the RL research division for multi‑agent safety, driving the next wave of provably safe autonomous systems across our product lines.

Ready to Push the Boundaries?

If this sounds like the challenge you have been looking for, we want to hear from you. We value what you can build over where you have been.