Principal Applied Scientist – Adversarial Alignment Detection & Discriminator

corpora-jobs-1778796293285-db9d41c6 - Frontier Development

Applied ScientistPrincipal1 position

⚡

Why This Role is Different

Frontier Development Role

You will create the first adversarial system that watches agents’ belief evolution in real time, detecting subtle misalignments before they cascade into catastrophic failures—an essential safety layer for any large‑scale, partially observable MARL deployment.

The Frontier Element

By treating belief trajectories as a sequence and training a discriminator to distinguish expert from agent trajectories, you will bridge adversarial learning, imitation learning, and multi‑agent RL in a way that has never been done at this scale.

🔬

Project Context

Research Area

Adversarial Alignment Detection

From: Partial Observability Amplification of Misalignment

Why This Role is Critical

To design and train a discriminator that monitors joint belief trajectories, flags abnormal divergences, and provides an adversarial signal that protects against reward hacking and deceptive policies.

What You Will Build

A temporal belief‑trajectory discriminator, training framework with expert trajectories, integration hooks for the JBWM and reward‑decomposition modules, and evaluation pipelines.

🛠

Key Responsibilities

Design a lightweight LSTM‑CNN or transformer discriminator that ingests joint belief trajectories and outputs misalignment scores.
Collect and curate expert belief trajectories from simulated and real‑world environments (SMAC, UAV swarms).
Develop an adversarial training loop where the policy is updated to fool the discriminator while maintaining task performance.
Integrate the discriminator’s signals into the reward‑decomposition module as a misalignment penalty.
Benchmark detection rates on adversarial scenarios, publish results, and open‑source the discriminator framework.

🎯

Required Skills & Experience

Technical Must-Haves

Generative Adversarial Networks (GANs)

Expert

Training discriminators that differentiate expert from agent trajectories.

Temporal Sequence Modeling (LSTM, Transformer)

Expert

Capturing long‑range dependencies in belief trajectories.

Reinforcement Learning and Imitation Learning

Advanced

Integrating adversarial signals into policy updates.

Python, PyTorch, and Distributed Training

Proficient

Scaling adversarial training across multiple GPUs.

Experience Requirements

6+ years of applied research in RL or adversarial learning, with at least 4 peer‑reviewed publications.
Track record of deploying adversarial or imitation learning systems in production or research settings.

Education

PhD in Computer Science, Electrical Engineering, or a related field with a focus on machine learning or AI safety.

⭐

Preferred Skills

Experience with multi‑agent imitation learning or adversarial imitation learning.
Knowledge of safety‑critical reinforcement learning benchmarks.

🤝

You Will Thrive Here If...

Thrives in high‑impact, low‑hand‑off environments and is comfortable shipping safety‑critical systems.
Shows a strong bias toward action and rapid experimentation.

📈

Impact & Growth

12-Month Impact

Within 12 months, achieve ≥90 % detection of misalignment events on benchmark tasks, reduce reward‑hacking incidents by 80 %, and embed the discriminator into the BAAC production pipeline.

Growth Opportunity

Lead a research group focused on AI safety and alignment, mentor junior scientists, and shape the company’s long‑term strategy for trustworthy multi‑agent systems.

Ready to Push the Boundaries?

If this sounds like the challenge you have been looking for, we want to hear from you. We value what you can build over where you have been.