← Back to modelling programme summary

Task 6: Counterfactual Explanation Robustness Simulation

Project: corpora-task-modelling-1778795810213-620a9917  •  Generated: 2026-05-14 22:57

Build a causal‑graph discovery and diffusion‑based manifold projection pipeline to generate counterfactual explanations and evaluate their robustness against adversarial perturbations in a simulated environment.

Causal DiscoveryDiffusion ModelsMonte Carlo SimulationAdversarial TestingFeasibility

Source in Roadmap / IdeateChapter 7 – FCA Foundations & Causal Graph Discovery
Why model firstEnables early assessment of explanation fidelity and robustness, guiding design of the recourse engine before deployment.

What Is Modelled

A data‑driven pipeline that learns a causal graph from multimodal interaction logs, projects observations onto a learned manifold using diffusion models, generates counterfactual explanations that respect the causal structure, and evaluates the fidelity and robustness of those explanations under synthetically injected adversarial perturbations.

Objectives

Success Criteria

Output Form

A reproducible Python package containing: (1) a causal‑graph estimator, (2) a diffusion‑based manifold projector, (3) a counterfactual generator, (4) an adversarial perturbation generator, and (5) a robustness evaluation dashboard. Outputs include JSON logs of counterfactuals, robustness metrics, and a Jupyter notebook for visual inspection.

Key Parameters & What They Affect

ParameterRange / UnitsAffectsNotes
causal_discovery_methodenum: ['PC', 'GES', 'FCI', 'LiNGAM']qualityChoice influences graph sparsity and orientation accuracy.
pc_alpha0.01 – 0.2qualitySignificance threshold for PC; lower values increase precision.
diffusion_steps50 – 200speedqualityNumber of denoising steps; higher values improve fidelity but increase latency.
guidance_scale0.0 – 5.0qualityControls adherence to causal constraints during sampling.
adversarial_noise_level0.0 – 0.5 (L2 norm)robustnessMagnitude of perturbation applied to observations.
simulation_seedinteger 0–9999reproducibilityEnsures deterministic adversarial scenarios.

Input Data

Required data:

Natural Sources (from the project)

Acquired Sources

  • Causal Discovery Benchmark Dataset (e.g., Tuebingen cause–effect pairs)
  • OpenAI Gym environments with known causal structure (e.g., FetchReach)
  • Stable Diffusion pre‑trained weights for image modalities
  • CleverHans/Foolbox adversarial libraries

Synthesised Sources

  • Synthetic causal graphs generated via DAG simulation (networkx)
  • Synthetic perturbations created with PGD or FGSM on observation vectors
  • Simulated sensor noise injected into recorded logs using NumPy

Engineer / Scientist Guidance

  1. Set up a reproducible Python environment (conda or venv) with PyTorch ≥1.12, CausalNex, DoWhy, and Diffusers.
  2. Load interaction logs and split into training/validation/test sets; ensure temporal ordering is preserved.
  3. Run causal discovery using the chosen algorithm (PC by default). Tune alpha via cross‑validation to achieve ≥0.85 precision.
  4. Train a domain‑specific DDPM (or DPM‑Solver) on clean observations; freeze the encoder and fine‑tune the denoiser for 50–100 steps.
  5. Implement a counterfactual sampler that: (a) samples latent codes via the diffusion model, (b) applies causal constraint guidance (e.g., via a penalty term on edge directions), and (c) decodes to counterfactual observations.
  6. Generate adversarial perturbations using PGD with a fixed L2 budget; apply them to validation data.
  7. Evaluate counterfactuals on perturbed data: compute outcome probability shift, feature change ratio, and causal plausibility score.
  8. Aggregate robustness metrics across perturbation levels; plot robustness curves.
  9. Integrate the pipeline into a Gym environment: at each step, feed the current observation, generate a counterfactual, and log the explanation and robustness score.
  10. Automate the entire workflow with a Makefile or CI pipeline; store results in a SQLite database for later analysis.

Recommended Tools

Python 3.11PyTorch 2.0Diffusers (HuggingFace) for DDPM/DPM‑SolverCausalNex / DoWhy for causal discoveryCleverHans / Foolbox for adversarial attacksOpenAI Gym / AirSim for simulationRay Tune / Optuna for hyper‑parameter searchTensorBoard / Weights & Biases for loggingJupyterLab for interactive exploration

Validation & Verification

Validate causal graph against synthetic ground truth using precision/recall. Validate diffusion reconstruction via MAE on held‑out perturbed samples. Validate counterfactuals by checking that the outcome probability changes by ≥0.7 while altering ≤10% of features, and that the causal graph constraints are respected (no edge reversal). Robustness is verified by measuring counterfactual success rate across 10 adversarial perturbation levels and ensuring ≥0.8. Cross‑validate with an independent dataset (e.g., Tuebingen cause–effect pairs).

Expected Impact

Quality

Provides counterfactual explanations that remain faithful under adversarial noise, improving operator trust and decision support.

Timescale

Reduces the design cycle for the recourse engine by 30% by exposing robustness early.

Cost

Avoids costly post‑deployment debugging by catching brittle explanations in simulation.

Risk Retired

Mitigates risk of misleading explanations in adversarial environments, reducing safety incidents.

Software Tool Development Prompts

Drop these into a coding assistant toscaffold the supporting software for this modelling task.

Create a Python module that loads a CSV of interaction logs, performs PC causal discovery with a user‑specified alpha, and returns a graph in networkx format. Include functions to compute precision/recall against a provided ground‑truth graph.
Implement a diffusion‑based counterfactual generator that takes a trained DDPM model, a causal graph, and an observation vector, and outputs a counterfactual. The generator should accept a guidance_scale parameter and enforce causal constraints by adding a penalty term to the latent sampling loss. Provide a CLI that accepts --obs_path, --graph_path, --guidance_scale, and outputs the counterfactual to a JSON file.

Risks & Assumptions