Staff Counterfactual Policy Evaluation Engineer

corpora-jobs-1778796293285-db9d41c6 - Frontier Development

Applied ScientistStaff1 position

⚡

Why This Role is Different

Frontier Development Role

Architect the next‑generation counterfactual engine that blends causal knowledge with adaptive importance weighting, delivering trustworthy blame scores even in high‑dimensional, non‑stationary bandit settings.

The Frontier Element

You’ll engineer a continuous adaptive blending (CAB) scheme that learns surrogate policies from logged data, enabling real‑time generation of counterfactual trajectories while maintaining unbiasedness—a novel contribution to offline RL evaluation.

🔬

Project Context

Research Area

CGRPA‑Plus Counterfactual Reasoning Module

From: Misattribution of Blame in Cooperative Multi‑Agent Systems

Why This Role is Critical

Accurate, low‑variance counterfactual estimates are the engine that turns causal priors into probabilistic blame scores.

What You Will Build

A scalable counterfactual simulation framework that generates contextual policy trajectories, learns surrogate logging policies, and applies importance‑weighted OPE with variance‑reduction techniques.

🛠

Key Responsibilities

Implement CGRPA‑Plus: learn a surrogate policy from logs, generate counterfactual trajectories, and compute weighted advantage estimates.
Integrate causal back‑door adjustment to correct for confounding before importance weighting.
Design variance‑reduction mechanisms (e.g., doubly robust estimators, control variates) tailored to multi‑agent contexts.
Develop diagnostics for overlap violations and sensitivity analyses for sparse contexts.
Benchmark the counterfactual engine against state‑of‑the‑art OPE methods on synthetic and real MAS datasets.

🎯

Required Skills & Experience

Technical Must-Haves

Off‑policy evaluation (IPS, DR, Doubly Robust)

Expert

Core to CGRPA‑Plus counterfactual weighting.

Contextual bandit algorithms and surrogate policy learning

Advanced

Generating realistic counterfactuals.

Probabilistic modeling of agent actions and contexts

Advanced

Enabling accurate counterfactual simulation.

High‑performance simulation and parallel computing

Proficient

Scaling counterfactual generation to millions of trajectories.

Experience Requirements

4+ years of research or industry experience in offline RL or bandit evaluation.
Publications on counterfactual reasoning or OPE in high‑dimensional settings.
Hands‑on experience with large‑scale simulation frameworks (e.g., Ray, MPI).

Education

PhD in Machine Learning, Statistics, or Operations Research with focus on reinforcement learning or causal inference.

⭐

Preferred Skills

Knowledge of causal discovery outputs and how to integrate them into counterfactual models.
Experience with Bayesian neural networks for policy modeling.
Familiarity with adversarial perturbation techniques for policy robustness.

🤝

You Will Thrive Here If...

Comfortable iterating on algorithmic prototypes without formal requirements.
Passion for pushing the limits of statistical guarantees in real‑world data.
Ability to explain complex statistical concepts to non‑technical stakeholders.

📈

Impact & Growth

12-Month Impact

In one year, produce a counterfactual engine that reduces variance of blame scores by 40% compared to baseline, enabling real‑time blame attribution in a multi‑agent defense simulation with >95% confidence in causal claims.

Growth Opportunity

Scale the counterfactual framework to multi‑domain deployments, lead a research group on offline RL evaluation, and influence product strategy for trustworthy AI.

Ready to Push the Boundaries?

If this sounds like the challenge you have been looking for, we want to hear from you. We value what you can build over where you have been.