Staff Bayesian Ensemble & Confidence Calibration Scientist

corpora-jobs-1778796293285-db9d41c6 - Frontier Development

Research ScientistStaff1 position

⚡

Why This Role is Different

Frontier Development Role

You will turn uncertainty into actionable trust. By fusing Bayesian inference with agent performance data, you’ll prevent sycophancy and ensure that the debate’s final verdict is statistically sound.

The Frontier Element

This role pioneers the first end‑to‑end Bayesian ensemble that operates in a live, multi‑agent debate, blending probabilistic programming with LLM confidence signals—a technique that has no direct precedent in commercial systems.

🔬

Project Context

Research Area

Cross‑Agent Confidence Calibration via Bayesian Ensembles

From: Hallucination Amplification in Multi‑Agent Debate

Why This Role is Critical

The HEAD framework replaces majority voting with a Bayesian ensemble that weighs each agent’s self‑confidence and historical trust. This requires sophisticated probabilistic modeling and real‑time inference at scale.

What You Will Build

A production‑ready Bayesian aggregation engine that dynamically updates agent weights, integrates external trust metrics, and exposes calibrated confidence scores to the debate orchestrator.

🛠

Key Responsibilities

Design the Bayesian aggregation framework that fuses self‑reported confidence, external trust metrics, and historical performance into a posterior over the final claim.
Implement real‑time weight updates and convergence checks that trigger dynamic debate depth control.
Develop evaluation pipelines that quantify voting bias reduction and sycophancy mitigation across simulated and real datasets.
Collaborate with the retrieval engineer to ensure evidence quality feeds into the Bayesian model.
Publish internal whitepapers and external papers to establish thought leadership in Bayesian multi‑agent decision making.

🎯

Required Skills & Experience

Technical Must-Haves

Probabilistic programming (Pyro, Stan, NumPyro)

Expert

Building Bayesian models that can scale to thousands of agents in real time.

Ensemble learning and confidence weighting

Expert

Designing algorithms that combine heterogeneous agent outputs into a single calibrated verdict.

Statistical evaluation of LLM outputs (calibration, AUROC, precision/recall)

Advanced

Quantifying the impact of Bayesian weighting on hallucination rates.

Python, C++, and GPU acceleration

Expert

Implementing inference engines that run within the debate latency budget.

Knowledge of multi‑agent systems and debate architectures

Advanced

Integrating the ensemble with the HEAD orchestrator and dynamic depth controller.

Experience Requirements

7+ years in AI research or applied statistics with a focus on Bayesian methods.
Published work on Bayesian ensembles, confidence calibration, or multi‑agent decision making.
Experience deploying probabilistic models at scale in production.

Education

PhD in Statistics, Computer Science, or a related field.

⭐

Preferred Skills

Experience with PolySwarm, SpatiO, or similar Bayesian‑weighted multi‑agent frameworks.
Familiarity with NIST RMF or ISO/IEC 23894 for risk‑based confidence assessment.
Knowledge of reinforcement‑learning‑based debate systems.

🤝

You Will Thrive Here If...

Comfortable pushing the limits of theory into production.
Strong analytical mindset that thrives on rigorous experimentation.
Proactive in communicating complex probabilistic concepts to engineering teams.

📈

Impact & Growth

12-Month Impact

Reduce voting bias and sycophancy by 30–40% in 12 months, enabling the debate system to maintain <3% hallucination rates even under adversarial conditions.

Growth Opportunity

Lead the research arm that expands Bayesian confidence calibration to other AI domains, shaping the next wave of trustworthy decision engines.

Ready to Push the Boundaries?

If this sounds like the challenge you have been looking for, we want to hear from you. We value what you can build over where you have been.