Misattribution of Blame in Cooperative Multi‑Agent Systems
TITLE OF THE INVENTION
Causal‑Robust Attribution Network for Resilient Blame Attribution in Cooperative Multi‑Agent Systems
FIELD OF THE INVENTION
The present invention relates to artificial intelligence and, more specifically, to methods and systems for resilient blame attribution in cooperative multi‑agent systems (MAS) operating in adversarial or partially‑observable environments.
BACKGROUND AND PRIOR ART
Blame misattribution undermines coordination, trust, and safety in MAS. When agents share a common reward signal, credit assignment errors arise: an agent may incorrectly attribute a teammate's successful outcome to its own action, leading to sub‑optimal policy updates and degraded coordination performance [v16027]. This misattribution is amplified in open environments where agents encounter non‑stationary dynamics; openness can violate the stationarity and compositional assumptions that many coordination algorithms rely on, further complicating learning and increasing the likelihood of erroneous blame [v14411]. Conventional reinforcement‑learning conventions, such as deterministic sampling and flat reward signals, fail to provide the fine‑grained attribution needed for reliable blame inference [v12421][v11995]. Human‑based organizational conventions also fall short, as naming schemes and documentation still rely on subjective interpretation [v903][v5150]. Recent advances in Bayesian causal graph learning from execution logs provide a principled way to infer inter‑agent influence structures, yet these methods often require manual graph specification and lack robustness to adversarial manipulation [v9728]. Counterfactual reasoning frameworks such as CGRPA‑Plus introduce contextual weighting of counterfactual trajectories, but still rely on accurate surrogate policies and can suffer from high variance [v9175]. Adversarial attacks on explanation methods (SHAP, LIME, integrated gradients) can destabilize feature‑importance maps, compromising interpretability [v6912]. Finally, human‑AI teaming dashboards that surface blame manifolds have shown promise in reducing misattribution, yet they still lack mechanisms to surface uncertainty and enforce robustness [v17029]. Thus, there remains a technical problem of providing a causally grounded, counterfactual‑aware, and adversarial‑robust blame attribution mechanism that can be visualized in real time for high‑stakes MAS deployments.
SUMMARY OF THE INVENTION
The invention discloses a Causal‑Robust Attribution Network (CRAN) that integrates a Bayesian causal discovery layer, a contextual counterfactual generation module (CGRPA‑Plus), and an adversarial‑robust explanation engine. The CRAN learns a causal graph from execution logs, generates a distribution of counterfactual policy trajectories weighted by causal likelihood, and ensembles SHAP, LIME, and integrated gradients while penalizing explanations that diverge under adversarial perturbations. The output is a blame manifold—a multi‑dimensional vector of responsibility, confidence, and robustness—for each agent, which can be visualized as a dynamic blame graph in real time. This architecture overcomes the limitations of prior art by providing causally faithful blame attribution, reducing variance through contextual weighting, and ensuring robustness against adversarial manipulation, thereby enhancing coordination, trust, and safety in cooperative MAS.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
Embodiment 1 – Causal Discovery Layer
The causal discovery layer learns a Bayesian causal graph G from execution logs L of the MAS. The graph captures temporal dependencies and filters out spurious correlations by embedding domain knowledge such as communication constraints and action observability. The learning procedure may employ a PC algorithm with temporal constraints or a NOTEARS‑style optimization, ensuring that G is a directed acyclic graph that reflects the causal fabric of the system [6][v9728].
Embodiment 2 – Counterfactual Group Relative Policy Advantage Plus (CGRPA‑Plus)
CGRPA‑Plus extends the standard inverse‑propensity‑weighting framework by incorporating contextual features into the counterfactual distribution. A surrogate policy π̂, learned from the logged data, generates proposal actions that approximate the optimal logging policy, thereby reducing variance. The counterfactual trajectories are re‑weighted by their likelihood under the causal graph G, producing a probabilistic blame score that reflects both contribution and responsibility [2][v9175].
Embodiment 3 – Adversarial‑Robust Explanation Engine
The explanation engine ensembles SHAP, LIME, and integrated gradients. A learned weighting scheme penalizes explanations that diverge under adversarial perturbations, trained on logs perturbed by adversarial attacks. This hardens the explanations and yields a robustness score for each attribution [7][8][1].
Embodiment 4 – Blame Manifold Output
The CRAN outputs a blame manifold M, a vector m = (responsibility, confidence, robustness) for each agent. The manifold can be visualized as a dynamic blame graph that updates in real time, allowing operators to intervene when blame attribution diverges from expected norms [v17029].
Embodiment 5 – Real‑Time Dashboard
A human‑AI teaming dashboard receives the blame manifold and renders it as an interactive visualization. The dashboard layers confidence and robustness metrics, enabling operators to assess the reliability of each attribution and to trigger corrective actions. The interface supports real‑time updates as new logs are ingested, ensuring that blame attribution remains current in dynamic environments [v17029].
Embodiment 6 – Deployment in Partially Observable Environments
The CRAN is configured to operate under partial observability and communication constraints. The causal discovery layer incorporates known protocol constraints, and the counterfactual module accounts for unobserved confounders, thereby maintaining robustness in open MAS settings [v14411].
Embodiment 7 – High‑Stakes Domain Deployment
The CRAN is suitable for autonomous defense, supply‑chain logistics, and disaster response, where misattribution can lead to catastrophic outcomes. The system’s causal grounding, counterfactual reasoning, and adversarial robustness provide the reliability required in such domains [v16027].
CLAIMS
1. A method for resilient blame attribution in a cooperative multi‑agent system comprising: acquiring execution logs from each agent; learning a Bayesian causal graph from the logs; generating counterfactual policy trajectories using a surrogate policy conditioned on the causal graph; computing a probabilistic blame score for each agent by weighting counterfactual outcomes by their likelihood under the causal graph; training an ensemble of explanation models on adversarially perturbed logs; selecting explanations that are robust to perturbations; and outputting a blame manifold comprising responsibility, confidence, and robustness metrics.
2. The method of claim 1, wherein the Bayesian causal graph is learned using a PC algorithm with temporal constraints.
3. The method of claim 1, wherein the counterfactual policy trajectories are generated using CGRPA‑Plus.
4. The method of claim 1, wherein the ensemble of explanation models includes SHAP, LIME, and integrated gradients.
5. The method of claim 1, wherein the ensemble is weighted by a learned penalty function that reduces weight for explanations that diverge under adversarial perturbations.
6. The method of claim 1, wherein the blame manifold is visualized as a dynamic blame graph updated in real time.
7. The method of claim 1, wherein the system operates in a partially observable environment with communication constraints.
8. The method of claim 1, wherein the system is deployed in high‑stakes domains such as autonomous defense or disaster response.
9. A system for resilient blame attribution in a cooperative multi‑agent system comprising: a causal discovery module that learns a Bayesian causal graph from execution logs; a counterfactual generation module that produces a distribution of policy trajectories conditioned on the causal graph; an adversarial‑robust explanation engine that ensembles SHAP, LIME, and integrated gradients and penalizes explanations that change under adversarial perturbations; and a blame manifold module that outputs a multi‑dimensional vector of responsibility, confidence, and robustness for each agent, and visualizes the manifold as a dynamic blame graph.
10. The system of claim 9, wherein the causal discovery module uses a PC algorithm with temporal constraints.
11. The system of claim 9, wherein the counterfactual generation module implements CGRPA‑Plus.
12. The system of claim 9, wherein the adversarial‑robust explanation engine is trained on logs perturbed by adversarial attacks.
13. The system of claim 9, wherein the blame manifold module includes a real‑time dashboard that updates as new logs are processed.
14. The system of claim 9, wherein the system operates in a partially observable environment with communication constraints.
15. The system of claim 9, wherein the system is used in high‑stakes domains such as autonomous defense.
ABSTRACT
A Causal‑Robust Attribution Network (CRAN) for cooperative multi‑agent systems integrates a Bayesian causal discovery layer, a contextual counterfactual generation module (CGRPA‑Plus), and an adversarial‑robust explanation engine. The network learns a causal graph from execution logs, generates counterfactual policy trajectories weighted by causal likelihood, and ensembles SHAP, LIME, and integrated gradients while penalizing explanations that diverge under adversarial perturbations. The output is a blame manifold—a multi‑dimensional vector of responsibility, confidence, and robustness for each agent—visualized as a dynamic blame graph in real time. This architecture provides causally faithful, counterfactual‑aware, and adversarial‑robust blame attribution, thereby enhancing coordination, trust, and safety in high‑stakes multi‑agent deployments.