Theory of Mind Defenses Against Communication Sabotage
TITLE OF THE INVENTION
Hybrid Theory‑of‑Mind Adversarial Defense Framework for Real‑Time Multi‑Agent Communication Sabotage Mitigation
FIELD OF THE INVENTION
The present invention relates to artificial intelligence, specifically to multi‑agent reinforcement learning systems that employ theory‑of‑mind reasoning for the detection and mitigation of adversarial communication sabotage in partially observable environments.
BACKGROUND AND PRIOR ART
Cooperative multi‑agent systems routinely exchange messages to coordinate actions. Adversarial actors can inject deceptive messages, corrupt shared beliefs, or hijack coordination protocols, thereby degrading performance or causing catastrophic failures. Existing defenses are limited. Real‑time adversarial communication detection has been demonstrated in IoT settings using adaptive curricula and dynamic anomaly scoring [v1040], yet these approaches lack explicit theory‑of‑mind reasoning and are not designed for high‑noise, high‑latency multi‑agent coordination. Robust reinforcement learning has been framed as a Stackelberg game to guarantee safety [v2655], but does not address real‑time message verification. Graph‑based belief regularization has been proposed to constrain belief updates [2], yet it is not integrated with an adaptive curriculum or a lightweight verification layer. Test‑time mitigation modules such as CLL [3] and simplified action decoders (SAD) [4] provide post‑hoc filtering but lack a principled adversarial training backbone. Consequently, there remains an unmet need for a unified, theory‑of‑mind driven defense architecture that (1) learns to anticipate deceptive messages, (2) regularizes belief updates to limit malicious influence, and (3) verifies incoming messages at execution time while preserving interpretability.
SUMMARY OF THE INVENTION
The invention discloses a Hybrid Theory‑of‑Mind Adversarial Defense (HTMAD) framework that integrates an adversarial curriculum‑driven ToM module (AC‑ToM), dynamic belief‑graph regularization (DBGR), and a test‑time verification layer (TTVL). AC‑ToM employs a large language model (LLM) as a semantic oracle to generate diverse adversarial communication scenarios during training, forming a bi‑level Stackelberg game that yields a policy provably robust to evolving sabotage tactics [1]. DBGR augments the ToM module with a graph‑based regularizer that penalizes high‑confidence belief updates inconsistent with an ensemble of inferred mental states, thereby limiting the impact of any single malicious utterance [2]. TTVL evaluates incoming messages against a learned canonical interaction manifold; messages that deviate are flagged, ignored, or clarified, enabling real‑time mitigation and auditability [3], [4]. The HTMAD pipeline operates in real time, preserving cooperative performance under high noise or latency while remaining interpretable for human operators.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
Embodiment 1 – Adversarial Curriculum‑Driven ToM (AC‑ToM)
AC‑ToM constructs a bi‑level Stackelberg game. The inner loop trains a multi‑agent reinforcement learning (MARL) agent to minimize regret against a fixed population of adversarial policies. The outer loop uses an LLM as a semantic oracle that generates executable adversarial or cooperative strategies in a Turing‑complete code space, thereby exposing the agent to a wide spectrum of deceptive tactics [1]. The curriculum is adaptive: the LLM samples messages conditioned on the agent’s current belief distribution, ensuring that the training distribution evolves with the agent’s learning progress.
Embodiment 2 – Dynamic Belief‑Graph Regularization (DBGR)
DBGR represents the agent’s internal epistemic state as a directed graph. Nodes encode natural‑language true/false statements; edges capture support, contradiction, or qualification relations. Each node carries a credibility attribute (external source reliability) and a confidence attribute (structural support) [v14955]. A static regularization term penalizes deviations from the graph’s constraint manifold, aligning self‑querying beliefs with encoded rules [v12791]. The regularizer is integrated into a Generalised Multi‑relational Graph Convolutional Network (GEM‑GCN) [v6901], enabling scalable inference over hundreds of belief nodes.
Embodiment 3 – Test‑Time Verification Layer (TTVL)
TTVL projects incoming messages onto a learned canonical interaction manifold. The manifold is constructed offline by aggregating latent representations of successful cooperative exchanges and computing their mean difference vector (amortised latent steering) [v5547]. At execution time, TTVL evaluates the Euclidean distance between the message’s latent embedding and the manifold; if the distance exceeds a threshold, the message is flagged as adversarial. The agent may ignore the message, request clarification, or invoke a fallback policy. This lightweight verification incurs sub‑50 ms latency, meeting real‑time constraints [v1040].
Embodiment 4 – Integrated HTMAD Pipeline
During training, the agent interacts in a partially observable environment while AC‑ToM injects adversarial messages. DBGR regularizes belief updates in real time, and TTVL learns to recognize manifold deviations. At execution, the agent processes messages through TTVL, applies DBGR‑regularized belief updates, and selects actions according to the robust policy derived from AC‑ToM. The pipeline is fully decentralized; each agent maintains its own ToM module and verification layer, enabling scalability to large teams while limiting bandwidth to essential control messages [v12013], [v3495].
Embodiment 5 – Scalability and Bandwidth Efficiency
HTMAD employs symbolic message generation via LLM‑Communicator and LLM‑Memory modules, reducing transmitted data to concise natural‑language tokens (e.g., “cover me”, “focus fire”) [v11003]. ActionCoordination frameworks restrict communication to one‑hop exchanges, yielding near‑optimal neighborhood structures with minimal overhead [v2941]. Lightweight protocols such as MAGIC‑MASK further demonstrate scalability to dozens of agents with sub‑millisecond latency [v2879].
Embodiment 6 – Hardware Acceleration for Low‑Latency Operation
TSLink architecture removes high‑latency DSP paths, achieving sub‑millisecond end‑to‑end delay for real‑time control loops [v9344], [v8447]. Coupling TSLink with the HTMAD pipeline ensures that belief updates and verification occur within the stringent timing budgets required for high‑noise, high‑latency environments.
CLAIMS
- An autonomous multi‑agent system comprising: a theory‑of‑mind module that receives messages from other agents; an adversarial curriculum‑driven training component that employs a large language model to generate diverse adversarial communication scenarios; a dynamic belief‑graph regularizer that constrains belief updates based on a graph of natural‑language statements; and a test‑time verification layer that evaluates incoming messages against a learned canonical interaction manifold, wherein the system selects actions according to a policy trained under the adversarial curriculum and regularized belief updates, and wherein the system operates in real time with sub‑50 ms latency.
- The system of claim 1, wherein the large language model is configured to generate executable adversarial strategies in a Turing‑complete code space.
- Claim 1, wherein the dynamic belief‑graph regularizer penalizes high‑confidence belief updates that deviate from an ensemble of inferred mental states.
- Claim 1, wherein the test‑time verification layer projects incoming messages onto a canonical manifold constructed from latent representations of successful cooperative exchanges.
- Claim 1, wherein the system includes a symbolic message generator that reduces bandwidth by transmitting concise natural‑language tokens.
- Claim 1, wherein the system incorporates TSLink hardware to achieve sub‑millisecond end‑to‑end latency.
- Claim 1, wherein the system is deployed in a partially observable environment and the policy is trained via a bi‑level Stackelberg game between the agent and the adversarial curriculum.
- Claim 1, wherein the system is scalable to large teams by restricting communication to one‑hop exchanges and employing a decentralized execution architecture.
- Claim 1, wherein the system includes an audit trail that records deviation scores of flagged messages for human review.
- Claim 1, wherein the system achieves cooperative performance under high noise or latency comparable to or exceeding that of baseline MARL agents.
- Claim 1, wherein the system is capable of detecting and mitigating adversarial messages in real time, preserving cooperative performance.
- Claim 1, wherein the system’s policy is provably robust to evolving threat signatures as demonstrated by the adversarial curriculum.
- Claim 1, wherein the system’s belief updates are constrained by a soft regularizer that limits the influence of any single message.
- Claim 1, wherein the system’s test‑time verification layer flags anomalous messages and records their deviation scores.
- Claim 1, wherein the system’s architecture is fully decentralized, enabling deployment in large‑scale multi‑agent networks.
ABSTRACT
The present invention discloses a Hybrid Theory‑of‑Mind Adversarial Defense (HTMAD) framework that protects multi‑agent systems from communication sabotage. HTMAD integrates an adversarial curriculum‑driven ToM module (AC‑ToM) that uses a large language model to generate diverse deceptive scenarios, a dynamic belief‑graph regularizer (DBGR) that constrains belief updates, and a test‑time verification layer (TTVL) that evaluates incoming messages against a learned canonical manifold. The system operates in real time with sub‑50 ms latency, preserves cooperative performance under high noise or latency, and provides an audit trail for human operators. The invention offers a scalable, interpretable, and provably robust defense architecture for real‑time multi‑agent coordination in adversarial environments.