Project: corpora-task-modelling-1778795810213-620a9917 • Generated: 2026-05-14 22:57
Hierarchical Bayesian inference with variational Monte Carlo to quantify policy uncertainty under noisy observations before deployment.
Bayesian InferenceMonte CarloVariational InferenceFeasibilitydepends on #1: Synthetic Adversarial Observation Perturbation Dataset Generation
| Source in Roadmap / Ideate | Chapter 1 – AOI-GBE Model Development |
|---|
| Why model first | Enables quantitative assessment of policy uncertainty and robustness to unseen perturbations, guiding architecture choices without requiring live agent deployment. |
|---|
What Is Modelled
The posterior distribution over agent policy parameters given a stream of noisy observations, where observations are corrupted by adversarial perturbations or sensor noise. The model captures the joint distribution of clean and perturbed observations using a conditional generative model and marginalises over it to obtain a calibrated policy posterior.
Objectives
- Build a hierarchical Bayesian model that jointly learns a conditional generative observation model (CC‑GAN) and a policy prior.
- Implement variational Monte Carlo (VMC) and Hamiltonian Monte Carlo (HMC) inference engines to sample from the policy posterior.
- Quantify calibration (ECE, Brier score) and robustness (policy regret under unseen perturbations) of the posterior.
- Provide a hyper‑heuristic orchestrator that selects the best inference engine and hyper‑parameters within a compute budget.
- Generate synthetic observation logs for validation and stress‑testing.
Success Criteria
- Posterior predictive checks show <5% calibration error on held‑out perturbed data.
- Policy regret under 30% unseen perturbations is <10% relative to nominal policy.
- Hyper‑heuristic converges to the best inference engine within 200 evaluations.
- Synthetic data generation reproduces the statistical properties of real logs (KL divergence <0.1).
Output Form
A Python package exposing a `PolicyPosteriorSampler` API that returns posterior samples, calibration metrics, and a provenance log. Includes a Jupyter notebook demo and a Docker image for reproducibility.
Engineer / Scientist Guidance
- Define the observation generative model as a conditional GAN (CC‑GAN) using PyTorch; condition on available sensor context and a latent noise vector.
- Implement the hierarchical Bayesian policy model in Pyro: plates for observations, policy parameters, and hyper‑parameters.
- Set up two inference back‑ends: (a) stochastic variational inference (SVI) with black‑box ELBO; (b) Hamiltonian Monte Carlo (HMC) via NumPyro’s NUTS.
- Create a hyper‑heuristic orchestrator using Optuna: each trial proposes a tuple (inference_engine, num_samples, hmc_steps, lr, prior_variance).
- Define the evaluation metric as a weighted sum of Expected Calibration Error (ECE) and policy regret on a held‑out perturbation set.
- Use a multi‑armed bandit (Thompson sampling) to select inference engines; update posterior over engine performance after each trial.
- Implement synthetic data generation: sample latent vectors, generate clean and perturbed observations via CC‑GAN, and store in HDF5 for reproducibility.
- Wrap the entire workflow in a Docker container; expose a REST API that accepts observation batches and returns posterior samples and metrics.
- Document the provenance of each sample (model version, hyper‑parameters, synthetic seed) in a JSON log for auditability.
- Validate the posterior by running posterior predictive checks: generate synthetic trajectories and compare with real ones using KS‑test and Wasserstein distance.
Recommended Tools
Pyro (for probabilistic programming)NumPyro (for GPU‑accelerated HMC)Optuna (hyper‑parameter optimization & bandit)TensorFlow Probability (alternative VI)PyTorch (for CC‑GAN implementation)HDF5 / Zarr (data storage)Docker (containerization)JupyterLab (interactive notebooks)Prometheus + Grafana (runtime monitoring)GitHub Actions (CI/CD for model training)
Validation & Verification
Posterior predictive checks will be performed on a held‑out set of perturbed observations. Calibration will be measured using Expected Calibration Error (ECE) and Brier score. Policy robustness will be quantified by computing regret against a nominal policy under a separate set of unseen perturbations. Results will be benchmarked against a baseline deterministic policy and an oracle that has access to clean observations.
Software Tool Development Prompts
Drop these into a coding assistant toscaffold the supporting software for this modelling task.
Create a Python class `PolicyPosteriorSampler` that uses Pyro to define a hierarchical Bayesian model with a conditional GAN observation model and a policy prior. The class should expose a `sample_posterior(observations, num_samples)` method that returns posterior samples and a `evaluate_calibration(samples, true_actions)` method that computes ECE and Brier score.
Implement an Optuna study that optimizes over inference engines (SVI, HMC), number of variational samples, HMC steps, learning rate, and prior variance. Use a custom objective that returns the weighted sum of calibration error and policy regret. Include a multi‑armed bandit callback that records the best performing engine and stops the study after 200 trials or when improvement < 1e-3.
Write a Dockerfile that installs Pyro, NumPyro, Optuna, and PyTorch, copies the `PolicyPosteriorSampler` code, and exposes a Flask API endpoint `/sample` that accepts JSON observations and returns posterior samples and metrics.