← Back to modelling programme summary

Task 2: Bayesian Policy Inference Simulation

Project: corpora-task-modelling-1778795810213-620a9917  •  Generated: 2026-05-14 22:57

Hierarchical Bayesian inference with variational Monte Carlo to quantify policy uncertainty under noisy observations before deployment.

Bayesian InferenceMonte CarloVariational InferenceFeasibilitydepends on #1: Synthetic Adversarial Observation Perturbation Dataset Generation

Source in Roadmap / IdeateChapter 1 – AOI-GBE Model Development
Why model firstEnables quantitative assessment of policy uncertainty and robustness to unseen perturbations, guiding architecture choices without requiring live agent deployment.

What Is Modelled

The posterior distribution over agent policy parameters given a stream of noisy observations, where observations are corrupted by adversarial perturbations or sensor noise. The model captures the joint distribution of clean and perturbed observations using a conditional generative model and marginalises over it to obtain a calibrated policy posterior.

Objectives

Success Criteria

Output Form

A Python package exposing a `PolicyPosteriorSampler` API that returns posterior samples, calibration metrics, and a provenance log. Includes a Jupyter notebook demo and a Docker image for reproducibility.

Key Parameters & What They Affect

ParameterRange / UnitsAffectsNotes
observation_noise_std0.0 – 1.0 (Gaussian std)calibrationpolicy uncertaintyHigher values increase posterior entropy and test robustness.
policy_prior_variance0.1 – 10.0posterior concentrationbias–variance trade‑offControls how strongly the prior influences the posterior.
num_variational_samples50 – 500estimation variancecompute timeTrade‑off between Monte Carlo noise and runtime.
hmc_steps10 – 100mixing speedsampling costNumber of leapfrog steps per HMC trajectory.

Input Data

Required data:

Natural Sources (from the project)

Acquired Sources

  • OpenAI Gym / MuJoCo environments for baseline policy training
  • CARLA autonomous driving simulator for realistic sensor streams
  • Open datasets of adversarial perturbations (e.g., Adversarial Patch dataset)
  • Stan or PyMC3 model templates from the literature

Synthesised Sources

  • Conditional GAN (CC‑GAN) trained on clean/corrupted pairs to generate synthetic perturbed observations.
  • Physics‑based noise injection scripts (e.g., adding Gaussian, salt‑pepper, or semantic perturbations).

Engineer / Scientist Guidance

  1. Define the observation generative model as a conditional GAN (CC‑GAN) using PyTorch; condition on available sensor context and a latent noise vector.
  2. Implement the hierarchical Bayesian policy model in Pyro: plates for observations, policy parameters, and hyper‑parameters.
  3. Set up two inference back‑ends: (a) stochastic variational inference (SVI) with black‑box ELBO; (b) Hamiltonian Monte Carlo (HMC) via NumPyro’s NUTS.
  4. Create a hyper‑heuristic orchestrator using Optuna: each trial proposes a tuple (inference_engine, num_samples, hmc_steps, lr, prior_variance).
  5. Define the evaluation metric as a weighted sum of Expected Calibration Error (ECE) and policy regret on a held‑out perturbation set.
  6. Use a multi‑armed bandit (Thompson sampling) to select inference engines; update posterior over engine performance after each trial.
  7. Implement synthetic data generation: sample latent vectors, generate clean and perturbed observations via CC‑GAN, and store in HDF5 for reproducibility.
  8. Wrap the entire workflow in a Docker container; expose a REST API that accepts observation batches and returns posterior samples and metrics.
  9. Document the provenance of each sample (model version, hyper‑parameters, synthetic seed) in a JSON log for auditability.
  10. Validate the posterior by running posterior predictive checks: generate synthetic trajectories and compare with real ones using KS‑test and Wasserstein distance.

Recommended Tools

Pyro (for probabilistic programming)NumPyro (for GPU‑accelerated HMC)Optuna (hyper‑parameter optimization & bandit)TensorFlow Probability (alternative VI)PyTorch (for CC‑GAN implementation)HDF5 / Zarr (data storage)Docker (containerization)JupyterLab (interactive notebooks)Prometheus + Grafana (runtime monitoring)GitHub Actions (CI/CD for model training)

Validation & Verification

Posterior predictive checks will be performed on a held‑out set of perturbed observations. Calibration will be measured using Expected Calibration Error (ECE) and Brier score. Policy robustness will be quantified by computing regret against a nominal policy under a separate set of unseen perturbations. Results will be benchmarked against a baseline deterministic policy and an oracle that has access to clean observations.

Expected Impact

Quality

Provides a statistically sound estimate of policy uncertainty, reducing over‑confidence in deployment.

Timescale

Enables rapid pre‑deployment validation (≈2–3 weeks) instead of full field trials.

Cost

Avoids costly on‑the‑fly retraining by identifying robust policies early; estimated savings of 20–30% in compute spend.

Risk Retired

Mitigates risk of catastrophic failure due to unseen observation perturbations and improves regulatory auditability.

Software Tool Development Prompts

Drop these into a coding assistant toscaffold the supporting software for this modelling task.

Create a Python class `PolicyPosteriorSampler` that uses Pyro to define a hierarchical Bayesian model with a conditional GAN observation model and a policy prior. The class should expose a `sample_posterior(observations, num_samples)` method that returns posterior samples and a `evaluate_calibration(samples, true_actions)` method that computes ECE and Brier score.
Implement an Optuna study that optimizes over inference engines (SVI, HMC), number of variational samples, HMC steps, learning rate, and prior variance. Use a custom objective that returns the weighted sum of calibration error and policy regret. Include a multi‑armed bandit callback that records the best performing engine and stops the study after 200 trials or when improvement < 1e-3.
Write a Dockerfile that installs Pyro, NumPyro, Optuna, and PyTorch, copies the `PolicyPosteriorSampler` code, and exposes a Flask API endpoint `/sample` that accepts JSON observations and returns posterior samples and metrics.

Risks & Assumptions