Modelling Task 6: Counterfactual Explanation Robustness Simulation

Project: corpora-task-modelling-1778795810213-620a9917 • Generated: 2026-05-14 22:57

Build a causal‑graph discovery and diffusion‑based manifold projection pipeline to generate counterfactual explanations and evaluate their robustness against adversarial perturbations in a simulated environment.

Causal DiscoveryDiffusion ModelsMonte Carlo SimulationAdversarial TestingFeasibility

Source in Roadmap / Ideate	Chapter 7 – FCA Foundations & Causal Graph Discovery
Why model first	Enables early assessment of explanation fidelity and robustness, guiding design of the recourse engine before deployment.

What Is Modelled

A data‑driven pipeline that learns a causal graph from multimodal interaction logs, projects observations onto a learned manifold using diffusion models, generates counterfactual explanations that respect the causal structure, and evaluates the fidelity and robustness of those explanations under synthetically injected adversarial perturbations.

Objectives

Accurately recover a causal graph from noisy, partially observed interaction logs.
Train a diffusion‑based manifold projector that can reconstruct perturbed observations while preserving causal semantics.
Generate counterfactual explanations that satisfy causal plausibility constraints and are minimally invasive in the latent space.
Quantify robustness of counterfactuals against a spectrum of adversarial perturbations (e.g., sensor noise, semantic tampering, feature poisoning).
Integrate the pipeline into a simulation environment (e.g., OpenAI Gym or AirSim) to validate end‑to‑end performance under realistic attack scenarios.

Success Criteria

Causal graph precision/recall ≥ 0.85 on held‑out synthetic benchmarks.
Diffusion reconstruction MAE ≤ 5% of clean data on perturbed samples.
Counterfactual fidelity (change in outcome probability) ≥ 0.7 while altering ≤ 10% of features.
Robustness score (counterfactual success rate under 10 adversarial perturbation levels) ≥ 0.8.
Simulation runs complete within 2 h per episode with ≤ 1 % failure rate.

Output Form

A reproducible Python package containing: (1) a causal‑graph estimator, (2) a diffusion‑based manifold projector, (3) a counterfactual generator, (4) an adversarial perturbation generator, and (5) a robustness evaluation dashboard. Outputs include JSON logs of counterfactuals, robustness metrics, and a Jupyter notebook for visual inspection.

Key Parameters & What They Affect

Parameter	Range / Units	Affects	Notes
causal_discovery_method	enum: ['PC', 'GES', 'FCI', 'LiNGAM']	quality	Choice influences graph sparsity and orientation accuracy.
pc_alpha	0.01 – 0.2	quality	Significance threshold for PC; lower values increase precision.
diffusion_steps	50 – 200	speedquality	Number of denoising steps; higher values improve fidelity but increase latency.
guidance_scale	0.0 – 5.0	quality	Controls adherence to causal constraints during sampling.
adversarial_noise_level	0.0 – 0.5 (L2 norm)	robustness	Magnitude of perturbation applied to observations.
simulation_seed	integer 0–9999	reproducibility	Ensures deterministic adversarial scenarios.

Input Data

Required data:

multimodal interaction logs (sensor streams, action histories, outcome labels)
ground‑truth causal graph (for synthetic benchmarks)
pre‑trained diffusion model weights (e.g., DDPM on domain data)
adversarial perturbation templates (noise, semantic attacks)

Natural Sources (from the project)

AOI‑GBE observation logs (Chapter 1 roadmap, Phase 1)
LLM‑AC generated scenario logs (Chapter 1 roadmap, Phase 3)
simulation telemetry from UAV swarm testbeds (Chapter 6 roadmap, Phase 1)

Acquired Sources

Causal Discovery Benchmark Dataset (e.g., Tuebingen cause–effect pairs)
OpenAI Gym environments with known causal structure (e.g., FetchReach)
Stable Diffusion pre‑trained weights for image modalities
CleverHans/Foolbox adversarial libraries

Synthesised Sources

Synthetic causal graphs generated via DAG simulation (networkx)
Synthetic perturbations created with PGD or FGSM on observation vectors
Simulated sensor noise injected into recorded logs using NumPy

Engineer / Scientist Guidance

Set up a reproducible Python environment (conda or venv) with PyTorch ≥1.12, CausalNex, DoWhy, and Diffusers.
Load interaction logs and split into training/validation/test sets; ensure temporal ordering is preserved.
Run causal discovery using the chosen algorithm (PC by default). Tune alpha via cross‑validation to achieve ≥0.85 precision.
Train a domain‑specific DDPM (or DPM‑Solver) on clean observations; freeze the encoder and fine‑tune the denoiser for 50–100 steps.
Implement a counterfactual sampler that: (a) samples latent codes via the diffusion model, (b) applies causal constraint guidance (e.g., via a penalty term on edge directions), and (c) decodes to counterfactual observations.
Generate adversarial perturbations using PGD with a fixed L2 budget; apply them to validation data.
Evaluate counterfactuals on perturbed data: compute outcome probability shift, feature change ratio, and causal plausibility score.
Aggregate robustness metrics across perturbation levels; plot robustness curves.
Integrate the pipeline into a Gym environment: at each step, feed the current observation, generate a counterfactual, and log the explanation and robustness score.
Automate the entire workflow with a Makefile or CI pipeline; store results in a SQLite database for later analysis.

Recommended Tools

Python 3.11PyTorch 2.0Diffusers (HuggingFace) for DDPM/DPM‑SolverCausalNex / DoWhy for causal discoveryCleverHans / Foolbox for adversarial attacksOpenAI Gym / AirSim for simulationRay Tune / Optuna for hyper‑parameter searchTensorBoard / Weights & Biases for loggingJupyterLab for interactive exploration

Validation & Verification

Validate causal graph against synthetic ground truth using precision/recall. Validate diffusion reconstruction via MAE on held‑out perturbed samples. Validate counterfactuals by checking that the outcome probability changes by ≥0.7 while altering ≤10% of features, and that the causal graph constraints are respected (no edge reversal). Robustness is verified by measuring counterfactual success rate across 10 adversarial perturbation levels and ensuring ≥0.8. Cross‑validate with an independent dataset (e.g., Tuebingen cause–effect pairs).

Expected Impact

Quality

Provides counterfactual explanations that remain faithful under adversarial noise, improving operator trust and decision support.

Timescale

Reduces the design cycle for the recourse engine by 30% by exposing robustness early.

Cost

Avoids costly post‑deployment debugging by catching brittle explanations in simulation.

Risk Retired

Mitigates risk of misleading explanations in adversarial environments, reducing safety incidents.

Software Tool Development Prompts

Drop these into a coding assistant toscaffold the supporting software for this modelling task.

Create a Python module that loads a CSV of interaction logs, performs PC causal discovery with a user‑specified alpha, and returns a graph in networkx format. Include functions to compute precision/recall against a provided ground‑truth graph.

Implement a diffusion‑based counterfactual generator that takes a trained DDPM model, a causal graph, and an observation vector, and outputs a counterfactual. The generator should accept a guidance_scale parameter and enforce causal constraints by adding a penalty term to the latent sampling loss. Provide a CLI that accepts --obs_path, --graph_path, --guidance_scale, and outputs the counterfactual to a JSON file.

Risks & Assumptions

Assumes that the interaction logs contain enough variability to learn a faithful causal graph; sparse data may lead to over‑fitting.
Diffusion model may not fully capture multimodal dependencies if trained on limited data; consider multimodal diffusion architectures.
Adversarial perturbations generated synthetically may not reflect real‑world attack patterns; supplement with domain‑specific attack templates.
Causal discovery algorithms are sensitive to hidden confounders; incorporate FCI or latent variable extensions if necessary.
Computational cost of diffusion sampling may be high; use DPM‑Solver or DDIM for faster inference.