Architect the next‑generation counterfactual engine that blends causal knowledge with adaptive importance weighting, delivering trustworthy blame scores even in high‑dimensional, non‑stationary bandit settings.
You’ll engineer a continuous adaptive blending (CAB) scheme that learns surrogate policies from logged data, enabling real‑time generation of counterfactual trajectories while maintaining unbiasedness—a novel contribution to offline RL evaluation.
CGRPA‑Plus Counterfactual Reasoning Module
From: Misattribution of Blame in Cooperative Multi‑Agent Systems
Accurate, low‑variance counterfactual estimates are the engine that turns causal priors into probabilistic blame scores.
A scalable counterfactual simulation framework that generates contextual policy trajectories, learns surrogate logging policies, and applies importance‑weighted OPE with variance‑reduction techniques.
PhD in Machine Learning, Statistics, or Operations Research with focus on reinforcement learning or causal inference.
In one year, produce a counterfactual engine that reduces variance of blame scores by 40% compared to baseline, enabling real‑time blame attribution in a multi‑agent defense simulation with >95% confidence in causal claims.
Scale the counterfactual framework to multi‑domain deployments, lead a research group on offline RL evaluation, and influence product strategy for trustworthy AI.
If this sounds like the challenge you have been looking for, we want to hear from you. We value what you can build over where you have been.