Misattribution of Blame in Cooperative Multi‑Agent Systems

Draft Patent Application 8 — For Review

Misattribution of Blame in Cooperative Multi‑Agent Systems

TITLE OF THE INVENTION

Causal‑Robust Attribution Network for Resilient Blame Attribution in Cooperative Multi‑Agent Systems

FIELD OF THE INVENTION

The present invention relates to artificial intelligence and, more specifically, to methods and systems for resilient blame attribution in cooperative multi‑agent systems (MAS) operating in adversarial or partially‑observable environments.

BACKGROUND AND PRIOR ART

Blame misattribution undermines coordination, trust, and safety in MAS. When agents share a common reward signal, credit assignment errors arise: an agent may incorrectly attribute a teammate's successful outcome to its own action, leading to sub‑optimal policy updates and degraded coordination performance ^[v16027]. This misattribution is amplified in open environments where agents encounter non‑stationary dynamics; openness can violate the stationarity and compositional assumptions that many coordination algorithms rely on, further complicating learning and increasing the likelihood of erroneous blame ^[v14411]. Conventional reinforcement‑learning conventions, such as deterministic sampling and flat reward signals, fail to provide the fine‑grained attribution needed for reliable blame inference ^[v12421]^[v11995]. Human‑based organizational conventions also fall short, as naming schemes and documentation still rely on subjective interpretation ^[v903]^[v5150]. Recent advances in Bayesian causal graph learning from execution logs provide a principled way to infer inter‑agent influence structures, yet these methods often require manual graph specification and lack robustness to adversarial manipulation ^[v9728]. Counterfactual reasoning frameworks such as CGRPA‑Plus introduce contextual weighting of counterfactual trajectories, but still rely on accurate surrogate policies and can suffer from high variance ^[v9175]. Adversarial attacks on explanation methods (SHAP, LIME, integrated gradients) can destabilize feature‑importance maps, compromising interpretability ^[v6912]. Finally, human‑AI teaming dashboards that surface blame manifolds have shown promise in reducing misattribution, yet they still lack mechanisms to surface uncertainty and enforce robustness ^[v17029]. Thus, there remains a technical problem of providing a causally grounded, counterfactual‑aware, and adversarial‑robust blame attribution mechanism that can be visualized in real time for high‑stakes MAS deployments.

SUMMARY OF THE INVENTION

The invention discloses a Causal‑Robust Attribution Network (CRAN) that integrates a Bayesian causal discovery layer, a contextual counterfactual generation module (CGRPA‑Plus), and an adversarial‑robust explanation engine. The CRAN learns a causal graph from execution logs, generates a distribution of counterfactual policy trajectories weighted by causal likelihood, and ensembles SHAP, LIME, and integrated gradients while penalizing explanations that diverge under adversarial perturbations. The output is a blame manifold—a multi‑dimensional vector of responsibility, confidence, and robustness—for each agent, which can be visualized as a dynamic blame graph in real time. This architecture overcomes the limitations of prior art by providing causally faithful blame attribution, reducing variance through contextual weighting, and ensuring robustness against adversarial manipulation, thereby enhancing coordination, trust, and safety in cooperative MAS.

DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

Embodiment 1 – Causal Discovery Layer
The causal discovery layer learns a Bayesian causal graph G from execution logs L of the MAS. The graph captures temporal dependencies and filters out spurious correlations by embedding domain knowledge such as communication constraints and action observability. The learning procedure may employ a PC algorithm with temporal constraints or a NOTEARS‑style optimization, ensuring that G is a directed acyclic graph that reflects the causal fabric of the system ^[6]^[v9728].

Embodiment 2 – Counterfactual Group Relative Policy Advantage Plus (CGRPA‑Plus)
CGRPA‑Plus extends the standard inverse‑propensity‑weighting framework by incorporating contextual features into the counterfactual distribution. A surrogate policy π̂, learned from the logged data, generates proposal actions that approximate the optimal logging policy, thereby reducing variance. The counterfactual trajectories are re‑weighted by their likelihood under the causal graph G, producing a probabilistic blame score that reflects both contribution and responsibility ^[2]^[v9175].

Embodiment 3 – Adversarial‑Robust Explanation Engine
The explanation engine ensembles SHAP, LIME, and integrated gradients. A learned weighting scheme penalizes explanations that diverge under adversarial perturbations, trained on logs perturbed by adversarial attacks. This hardens the explanations and yields a robustness score for each attribution ^[7]^[8]^[1].

Embodiment 4 – Blame Manifold Output
The CRAN outputs a blame manifold M, a vector m = (responsibility, confidence, robustness) for each agent. The manifold can be visualized as a dynamic blame graph that updates in real time, allowing operators to intervene when blame attribution diverges from expected norms ^[v17029].

Embodiment 5 – Real‑Time Dashboard
A human‑AI teaming dashboard receives the blame manifold and renders it as an interactive visualization. The dashboard layers confidence and robustness metrics, enabling operators to assess the reliability of each attribution and to trigger corrective actions. The interface supports real‑time updates as new logs are ingested, ensuring that blame attribution remains current in dynamic environments ^[v17029].

Embodiment 6 – Deployment in Partially Observable Environments
The CRAN is configured to operate under partial observability and communication constraints. The causal discovery layer incorporates known protocol constraints, and the counterfactual module accounts for unobserved confounders, thereby maintaining robustness in open MAS settings ^[v14411].

Embodiment 7 – High‑Stakes Domain Deployment
The CRAN is suitable for autonomous defense, supply‑chain logistics, and disaster response, where misattribution can lead to catastrophic outcomes. The system’s causal grounding, counterfactual reasoning, and adversarial robustness provide the reliability required in such domains ^[v16027].

CLAIMS

1. A method for resilient blame attribution in a cooperative multi‑agent system comprising: acquiring execution logs from each agent; learning a Bayesian causal graph from the logs; generating counterfactual policy trajectories using a surrogate policy conditioned on the causal graph; computing a probabilistic blame score for each agent by weighting counterfactual outcomes by their likelihood under the causal graph; training an ensemble of explanation models on adversarially perturbed logs; selecting explanations that are robust to perturbations; and outputting a blame manifold comprising responsibility, confidence, and robustness metrics.

2. The method of claim 1, wherein the Bayesian causal graph is learned using a PC algorithm with temporal constraints.

3. The method of claim 1, wherein the counterfactual policy trajectories are generated using CGRPA‑Plus.

4. The method of claim 1, wherein the ensemble of explanation models includes SHAP, LIME, and integrated gradients.

5. The method of claim 1, wherein the ensemble is weighted by a learned penalty function that reduces weight for explanations that diverge under adversarial perturbations.

6. The method of claim 1, wherein the blame manifold is visualized as a dynamic blame graph updated in real time.

7. The method of claim 1, wherein the system operates in a partially observable environment with communication constraints.

8. The method of claim 1, wherein the system is deployed in high‑stakes domains such as autonomous defense or disaster response.

9. A system for resilient blame attribution in a cooperative multi‑agent system comprising: a causal discovery module that learns a Bayesian causal graph from execution logs; a counterfactual generation module that produces a distribution of policy trajectories conditioned on the causal graph; an adversarial‑robust explanation engine that ensembles SHAP, LIME, and integrated gradients and penalizes explanations that change under adversarial perturbations; and a blame manifold module that outputs a multi‑dimensional vector of responsibility, confidence, and robustness for each agent, and visualizes the manifold as a dynamic blame graph.

10. The system of claim 9, wherein the causal discovery module uses a PC algorithm with temporal constraints.

11. The system of claim 9, wherein the counterfactual generation module implements CGRPA‑Plus.

12. The system of claim 9, wherein the adversarial‑robust explanation engine is trained on logs perturbed by adversarial attacks.

13. The system of claim 9, wherein the blame manifold module includes a real‑time dashboard that updates as new logs are processed.

14. The system of claim 9, wherein the system operates in a partially observable environment with communication constraints.

15. The system of claim 9, wherein the system is used in high‑stakes domains such as autonomous defense.

ABSTRACT

A Causal‑Robust Attribution Network (CRAN) for cooperative multi‑agent systems integrates a Bayesian causal discovery layer, a contextual counterfactual generation module (CGRPA‑Plus), and an adversarial‑robust explanation engine. The network learns a causal graph from execution logs, generates counterfactual policy trajectories weighted by causal likelihood, and ensembles SHAP, LIME, and integrated gradients while penalizing explanations that diverge under adversarial perturbations. The output is a blame manifold—a multi‑dimensional vector of responsibility, confidence, and robustness for each agent—visualized as a dynamic blame graph in real time. This architecture provides causally faithful, counterfactual‑aware, and adversarial‑robust blame attribution, thereby enhancing coordination, trust, and safety in high‑stakes multi‑agent deployments.

1	Adversarial attacks on cooperative multi-agent deep reinforcement learning: a dynamic group-based adversarial example transferability method 2023-07-02 https://doi.org/10.1007/s40747-023-01145-w ... the IEEE/CVF Conference on Computer Vision and Pattern Recognition2022 N H Pham, L M Nguyen, J Chen, H T Lam, S Das, T-W Weng, Evaluating robustness of cooperative MARL: a modelbased approach. 2022 Adversarial attacks on multi-agent communication. J Tu, T Wang, J Wang, S Manivasagam, M Ren, R Urtasun, Proceedings of the IEEE/CVF International Conference on Computer Vision. the IEEE/CVF International Conference on Computer Vision2021 A Concise Introduction to Decentralized POMDPs. F A Oliehoe...
2	Curriculum Learning With Counterfactual Group Relative Policy Advantage For Multi-Agent Reinforcement Learning 2025-06-08 https://doi.org/10.48550/arXiv.2506.07548 While training can leverage centralized information (full state s and all agents' histories τ ), execution must be decentralized -each agent's policy π a depends only on its local history τ a . This framework subsumes both the fully observable MMDP case (when O(s, a) = s) and standard POMDPs (when n = 1). The key challenge emerges from the exponential growth of joint action space U n and the partial observability constraints during execution. MARL algorithms are typically categorized into three ...
3	You know the saying: it takes all sorts? 2026-03-15 https://www.trainingjournal.com/2025/content-type/features/your-multi-dimensional-workforce-is-a-valuable-asset-three-ways-to-make-the-most-of-difference/ Root cause analysis usually identifies one or a small number of factors, and attributes blame. Mess mapping reveals the systemic nature of such failures, and avoids the fundamental attribution error: blaming someone while ignoring the context in which they worked. The red team This well-known adversarial approach has applications beyond the military and cybersecurity....
4	Goodhart's Law Applies to NLP's Explanation Benchmarks 2026-01-30 https://doi.org/10.18653/v1/2024.findings-eacl.88 Danish Pruthi, Mansi Gupta, Bhuwan Dhingra, Graham Neubig, Zachary C Lipton, Annual Conference of the Association for Computational Linguistics (ACL). July 2020 Gradient-based analysis of nlp models is manipulable. Junlin Wang, Jens Tuyls, Eric Wallace, Sameer Singh, arXiv:2010.054192020arXiv preprint Fooling neural network interpretations via adversarial model manipulation. Juyeon Heo, Sunghwan Joo, Taesup Moon, Advances in Neural Information Processing Systems (NeurIPS). 2019 Explanations can ...
5	It's Wednesday, February 25, 2026, and here are the top tech stories making waves today. 2026-03-09 https://techstartups.com/2026/02/25/top-tech-news-today-february-25-2026/ For startups building "AI for gov," it's a signal that the bar is rising: winning won't just be about model quality, but about compliance, integration, and trust frameworks. Why It Matters: Government adoption of frontier AI in classified workflows can reshape the competitive landscape for enterprise AI - and accelerate regulation expectations. Amazon's AI coding tool backlash shows the limits of "blame the human" narratives The Register describes internal turbulence around Amazon's AI coding ef...
6	Lost in Context: The Influence of Context on Feature Attribution Methods for Object Recognition 2024-12-12 https://doi.org/10.1145/3702250.3702254 Insights from Adebayo et al. and Yang et al. challenge the reliability of popular feature attribution tools like saliency maps, which often misrepresent the causal impact of features on model decisions, particularly in scenarios influenced by complex background information.Yang et al. further demonstrate that attribution methods vary in their ability to prioritize features accurately, often failing to align model interpretations with actual feature relevancy, especially under adversarial conditi...
7	Revealing Vulnerabilities of Neural Networks in Parameter Learning and Defense Against Explanation-Aware Backdoors 2025-12-31 https://doi.org/10.48550/arxiv.2403.16569 Rieger and Hansen devised an effective defense against adversarial attacks by combining multiple explanation methods, batting aside manipulation but possibly welcoming method-specific explanation.Lakkaraju et al. introduced a model training approach for producing resilient explanations, utilizing adversarial samples in training to discern discriminatory features.Gan et al. put forth MeTFA, a tool for enhancing explanation algorithm stabil-ity with theoretical guarantees, applicable to any featur...
8	Global Prediction of Dengue Incidence Using an Explainable Artificial Intelligence - Driven ConvLSTM Integrating Environmental, Health, and Socio - Economic Determinants 2026-04-05 https://doi.org/10.1002/hsr2.72280 ... y^i-yi\|,R2=1- i=1n(y^i-yi) in(y^i-y ) Where, n denotes the number of observations and p the number of predictors. 2.3.6 Feature Contribution and Sensitivity Analyses Using SHAP SHapley Additive exPlanations (SHAP) and permutation - based importance were used to quantify predictor contributions. SHAP values for feature i are: i= S F{i}\|S\|!(\|F\|-\|S\|-1)!\|F\|[fs {i}(XS {i})-fs(xs)] Where, F is the set of all features, S is a subset of features excluding i, fs(xs)denotes the model prediction using ...
9	Towards Norms for State Responsibilities regarding Online Disinformation and Influence Operations 2023-06-18 https://doi.org/10.34190/eccws.22.1.1121 Rid's (2020) book, Active Measures: The Secret History of Disinformation and Political Warfare, considers a cyber security incident as an influence operation: a group calling themselves the Shadow Brokers were selling cyber security tools stolen from the U.S. National Security Agency online; however, the narrative surrounding this appeared to be an influence operation to embarrass the agency as the tools were eventually released openly on the Internet. Gleicher (20221;2022b) indicates that there...

Misattribution of Blame in Cooperative Multi‑Agent Systems

Contents