Validation: Adversarial Observation Perturbations and Policy Inference

The core challenge in multi‑agent coordination under hostile environments is to derive policy inference mechanisms that remain reliable when agents’ observations are subtly perturbed by adversaries. Adversarial observation perturbations (AOPs) can stem from noisy telemetry, malicious sensor spoofing, or targeted semantic manipulation (e.g., prompt injection in LLM‑driven agents). The objective is therefore to construct inference frameworks that can (i) detect, (ii) adapt to, and (iii) recover from AOPs while preserving cooperative performance. This objective is crucial for trustworthy autonomous fleets, cyber‑security defenders, and any distributed AI that must maintain compositional integrity in the presence of unseen threats.

1.3 Ideate/Innovate

To transcend the limitations above, we propose a frontier methodology called *Adversarial Observation Inference via Generative Bayesian Ensembles (AOI‑GBE). The key components are:

Collectively, AOI‑GBE constitutes a probabilistic, generative, curriculum‑aware, and explainable framework that moves beyond static worst‑case bounds toward adaptive, data‑driven inference under adversarial observation perturbations.

Independent Validation

Detection, adaptation, and recovery of adversarial observation perturbations while preserving cooperative performance

adversarial observation perturbation detection cooperative multi-agent performanceadaptive recovery from sensor spoofing multi-agent coordinationrobust policy inference under observation noise multi-agent systemspreserving cooperation under adversarial telemetry perturbations

UAV swarms must detect, adapt to, and recover from observation‑based attacks while still executing mission objectives. Recent work demonstrates that rapid re‑configuration and cooperative fault‑tolerance can be achieved even under degraded sensory conditions, enabling safe large‑scale operations in contested environments ^[v16222]. The key insight is that detection must be distributed across the swarm, allowing individual agents to flag anomalous inputs and trigger local recovery protocols without central bottlenecks.Adversarial perturbations that target perception modules can be mitigated by embedding the sensor data into a quantum‑enhanced digital twin. By mapping telemetry onto entangled registers and monitoring for bit‑flip, phase‑flip, or amplitude‑damping signatures, the system can detect and isolate corrupted observations before they propagate through the control loop ^[v7024]. This approach preserves the fidelity of cooperative decision‑making while providing a cryptographic audit trail of any detected tampering.When multiple drones share learning resources, privacy‑preserving federated training becomes essential. Secure aggregation and differential privacy mechanisms allow each agent to contribute gradients derived from local telemetry without exposing raw sensor streams, thereby reducing the risk of model extraction or inference attacks ^[v7273]. Coupling this with on‑board anomaly detectors ensures that compromised updates are rejected before they influence the swarm’s policy.Decentralized motion planning can further enhance robustness by integrating adaptive denoising into the trajectory prediction pipeline. A reinforcement‑learning‑based planner that learns to filter out adversarial noise while maintaining high‑fidelity motion estimates has been shown to improve both safety and performance in multi‑robot scenarios ^[v7414]^[v7032]. The combination of local denoising and global consensus on motion plans allows the swarm to re‑route around compromised agents or corrupted observations in real time.Future research should focus on harmonizing these layers—distributed detection, quantum‑based verification, privacy‑preserving learning, and adaptive planning—into a unified framework. Such an architecture would enable UAV swarms to maintain cooperative performance even when faced with sophisticated observation‑based attacks, thereby extending operational envelopes in both civil and defense contexts.

Generative Observation Modeling (CC‑GAN) for reconstructing missing or corrupted sensor streams

conditional GAN sensor data reconstruction multi-agentgenerative adversarial network missing sensor stream recoveryCC-GAN joint distribution clean perturbed observationsoffline training nominal adversarial data generative model

Generative observation modeling with conditional GANs (CC‑GAN) has shown promise for reconstructing missing or corrupted sensor streams. In a lightweight GAN framework, a generator learns to impute missing heart‑rate samples while a discriminator enforces realism, and the combined model is coupled with a rule‑based anomaly detector to flag early infection signs in wearable data ^[v7842]. Extending this idea, a hybrid architecture that integrates a bidirectional GRU for temporal feature extraction with a GAN for data completion has achieved higher reconstruction accuracy than pure autoregressive or diffusion models, especially when the missing‑data ratio is high ^[v84]. These studies demonstrate that conditioning on the available sensor context allows the generator to capture complex temporal dependencies that simple interpolation or AR models miss.The core of CC‑GAN is the conditional generator, which receives both a latent vector and a conditioning vector derived from the observed sensor streams. Recent work on conditional GANs for medical imaging (e.g., time‑to‑peak MRI reconstruction) illustrates how a carefully designed conditioning augmentation and auxiliary classifier can improve sample fidelity and preserve clinically relevant features ^[v16556]. Similar conditioning strategies can be adapted to multimodal sensor data, where auxiliary heads encode modality‑specific statistics or missing‑data masks, thereby guiding the generator toward plausible completions.Despite these successes, several challenges remain. First, GAN training is notoriously unstable, and the high dimensionality of multivariate sensor streams can exacerbate mode collapse, leading to overly smooth or unrealistic imputations. Second, the lack of ground‑truth for missing segments in real deployments makes it difficult to evaluate reconstruction quality objectively; proxy metrics such as downstream task performance or consistency with physical sensor models are often required. Finally, privacy and security concerns arise when generative models are deployed on edge devices or in federated settings, as the generator may inadvertently leak sensitive patterns unless differential‑privacy or secure‑aggregation techniques are incorporated.Future research should therefore focus on robust training objectives that combine adversarial loss with physics‑based or domain‑specific regularizers, on developing benchmark datasets with realistic missing‑data patterns, and on integrating privacy‑preserving mechanisms into CC‑GAN pipelines. When these issues are addressed, conditional generative modeling stands to become a powerful tool for real‑time sensor fault tolerance and data‑driven decision support in IoT and health‑monitoring systems.

Bayesian Policy Inference marginalizing over generative observation model for robust policy posterior

hierarchical Bayesian policy inference adversarial observationpolicy posterior marginalization generative observation modelrobust MARL Bayesian inference against unseen attackslatent policy Bayesian model observation likelihood

Bayesian policy inference that integrates a generative observation model offers a principled way to capture both the dynamics of the agent and the stochasticity of the environment. By treating the observation process as a latent variable, the posterior over policies can be expressed as an integral over all possible observation realizations, which automatically propagates epistemic uncertainty into the decision‑making process. This hierarchical formulation has been successfully applied to UAV trajectory planning under adversarial jamming, where expert demonstrations, symbolic planning, and wireless signal feedback are encoded in a joint generative model that is then queried for policy updates via Bayesian active inference. ^[v16569]Marginalizing the observation model is computationally challenging, but amortized variational inference provides a scalable solution. Recent work on adversarial robustness of amortized Bayesian inference demonstrates that, when the likelihood is learned jointly with a variational posterior, the resulting policy posterior remains stable even under perturbations of the observation distribution. The approach leverages a learned density estimator to approximate the marginal likelihood, enabling efficient Monte‑Carlo integration over the observation space while preserving the Bayesian update rule. ^[v7329]Combining generative adversarial networks (GANs) with Bayesian inference further enhances the fidelity of the observation model. A GAN can learn a high‑dimensional, multimodal distribution of sensor data, while a Bayesian layer maps these samples to latent policy parameters. This hybrid architecture allows the policy posterior to be conditioned on realistic synthetic observations, improving generalization to unseen environments and reducing the need for exhaustive real‑world data collection. ^[v3192]Domain shift and adversarial attacks are mitigated by adversarial variational Bayesian inference, which jointly learns domain indices and a robust posterior over policies. By treating the domain index as a latent variable and enforcing an adversarial loss that encourages indistinguishable latent representations across domains, the method achieves near‑optimal domain adaptation while maintaining a coherent Bayesian uncertainty estimate for the policy. This framework is particularly effective in multi‑domain settings such as autonomous driving or robotic manipulation where the observation statistics can vary dramatically. ^[v7040]Finally, the practical impact of these techniques is evident in signal‑change detection for biomedical applications. A hierarchical generative model that captures subtle variations in physiological signals, combined with Bayesian policy inference, yields robust detection of anomalies even under noisy or incomplete observations. The marginalization over the generative observation model ensures that the policy posterior remains calibrated, enabling reliable decision‑making in safety‑critical contexts. ^[v9541]

LLM‑Driven Adversarial Curriculum generating semantic adversarial scenarios for policy brittleness

LLM generated semantic adversarial scenarios multi-agentprompt injection attack curriculum reinforcement learningLLM adversarial curriculum maximizing regret MARLsemantic manipulation map tiles reinforcement learning

Large language models (LLMs) can now produce richly detailed, semantically coherent prompts that expose hidden weaknesses in downstream policies, yet the same sensitivity to prompt design and inductive biases that enables such creativity also makes policies brittle under semantic perturbations. Empirical studies show that minor rubric changes or context variations can drastically alter LLM judgments, underscoring the need for value‑aligned, debate‑based multi‑agent frameworks that surface divergent perspectives before deployment ^[v3604].A practical way to generate adversarial scenarios is to embed the LLM within a multi‑agent system (MAS) where an attacker agent crafts jailbreak or policy‑shifting prompts, a target agent executes the policy, and a judge agent evaluates malicious intent and success. This iterative attacker–target–judge loop has proven effective for automated red‑teaming and for exposing policy brittleness in a controlled, reproducible manner ^[v4009].However, the generation of realistic scenarios often relies on retrieval‑augmented generation (RAG) pipelines that combine semantic search with contextual grounding. While RAG can surface relevant knowledge, inconsistencies in retrieval or mis‑aligned embeddings can introduce noise that masks true policy weaknesses, necessitating careful validation of retrieved content ^[v5041].Policy performance also degrades sharply when faced with ambiguous or underspecified inputs, a phenomenon that has been quantified as a >30 % drop in state‑of‑the‑art models like GPT‑4. This highlights the importance of grounding LLM outputs in concrete, verifiable specifications to avoid semantic drift and maintain robustness ^[v5245].Finally, unified adversarial frameworks such as PDJA that jointly perturb perception and action spaces provide a more comprehensive stress test for policies. Integrating LLM‑driven curriculum generation with such frameworks can systematically expose and mitigate brittleness, guiding the design of more resilient policy architectures ^[v4152].

Cooperative Resilience Layer monitoring observation entropy and triggering local recovery policies

cooperative resilience observation entropy threshold recovery policyentropy based anomaly detection multi-agent coordinationlocal recovery policy graceful degradation multi-agentanticipation resistance transformation cooperative resilience

Cooperative resilience layers aim to keep multi‑agent systems functioning when local observations become unreliable or the environment shifts abruptly. Centralized‑training, decentralized‑execution (CTDE) methods such as MAPPO provide a principled way to learn joint policies while each agent acts on its own observation, and the centralized critic supplies a stable learning signal that can detect when the joint state distribution drifts from the training manifold ^[v9672].A practical trigger for local recovery is the entropy of the observation stream. In neuromorphic networks, entropy analysis revealed that when the network entropy rises above a threshold the system enters a “winner‑take‑all” regime that is fragile to perturbations ^[v6331]. Monitoring this entropy in real time allows an agent to flag a potential failure mode and invoke a pre‑defined local recovery policy before the system collapses.Entropy‑augmented reinforcement learning further supports this approach. Soft Actor‑Critic (SAC) maximizes a reward‑entropy trade‑off, and the entropy bonus can be interpreted as a safety margin: when the policy’s entropy falls below a critical value, the agent is likely over‑confident and may be stuck in a suboptimal regime ^[v16468]. Detecting such a drop can automatically trigger a local policy reset or a switch to a more exploratory mode.Biological systems provide an additional illustration. In the cyclic‑AMP binding protein CAP, a sharp entropic penalty accompanies the second ligand binding event, signaling a cooperative allosteric transition ^[v16401]. Analogously, a sudden change in observation entropy can be interpreted as a cooperative transition in the agent ensemble, prompting a coordinated local recovery action.By integrating CTDE learning, continuous entropy monitoring, and entropy‑driven recovery triggers, cooperative systems can maintain resilience in dynamic, partially observable environments while keeping local policies adaptive and robust.

Meta‑Learning inference‑time adaptation of generative observation model to evolving adversarial tactics

meta learning generative model online adaptation adversarial tacticsMAML style inference time adaptation generative observation modelonline drift detection generative adversarial network adaptationadaptive generative model to evolving attacks multi-agent

Meta‑learning has emerged as a principled way to endow generative observation models with rapid inference‑time adaptation, especially when adversarial tactics evolve on a sub‑second timescale. Gradient‑based schemes such as MAML, FOMAML, REPTILE and CAVIA learn a shared initialization that can be fine‑tuned with only a few gradient steps, enabling IoT‑edge devices to update their generative models on‑line without full retraining cycles ^[v8965].Dynamic adaptation builds on this by integrating online learning and transfer‑learning pipelines that ingest fresh data streams in real time. Fine‑tuning the final network layer or a small subset of parameters while keeping the bulk of the model frozen preserves stability and reduces computational load, a strategy that has proven effective in continuous‑learning scenarios ^[v9514].When adversarial tactics shift—such as a fraudster changing transaction patterns or a malware author altering payloads—continuous monitoring and periodic re‑training become essential. Meta‑learning frameworks can detect distributional drift and trigger rapid adaptation, allowing the model to “remember” prior regimes while quickly learning new ones, thereby mitigating catastrophic forgetting ^[v1365].An adaptive detection architecture that couples a Conditional Wasserstein GAN with continual learning further enhances robustness. By generating drifted traffic samples and clustering latent features, the system updates detection thresholds on the fly, maintaining high precision even as attack signatures evolve ^[v12298].Finally, a meta‑auxiliary learning strategy based on MAML aligns auxiliary losses with the primary generative objective during inference. The shared encoder is optimized on‑the‑fly using auxiliary signals while the decoder remains fixed, ensuring that the model’s internal representations stay relevant to the current adversarial context ^[v11819].

Explainable Inference Traces producing saliency maps over latent space to trace perturbation influence

explainable inference traces saliency latent space generative modelpost hoc saliency maps policy posterior multi-agenthuman interpretability perturbation influence inference pipelineexplainable AI policy inference adversarial observation

Explainable inference traces that map perturbation influence onto latent‑space saliency maps combine two complementary XAI paradigms: gradient‑based attribution and counterfactual reasoning. In the CNN–GAN framework of Ref ^[v6719], saliency maps are generated by back‑propagating gradients through the generator and discriminator, revealing which latent dimensions drive specific visual features. This approach not only exposes model‑level decisions but also allows practitioners to edit latent codes and observe the resulting changes, thereby providing a transparent “what‑if” analysis that is difficult to achieve with black‑box methods.For medical imaging, Ref ^[v16647] demonstrates that voxel‑wise saliency maps derived from a U‑Net brain‑age predictor can be interpreted as local age contributions. However, the authors note that saliency explanations vary across methods, underscoring the need for consistent, perturbation‑aware attribution. By integrating latent‑space perturbations—such as shifting a latent vector along a principal component—researchers can quantify how specific latent factors influence the age estimate, offering a more robust explanation than pixel‑level heatmaps alone.Latent‑space regularization, as proposed in Ref ^[v2147], smooths the manifold so that small latent perturbations produce predictable, semantically coherent outputs. This property is essential for traceability: when a perturbation alters a latent dimension, the resulting change in the generated image can be directly linked to the underlying semantic concept, enabling clinicians or designers to verify that the model’s internal representations align with domain knowledge.Counterfactual explanations, explored in Ref ^[v10170], complement saliency by identifying minimal latent edits that flip a model’s prediction. By generating counterfactual latent codes and visualizing the corresponding saliency maps, one can trace the causal chain from latent perturbation to output change, thereby validating the model’s reasoning process and exposing potential biases or spurious correlations.Finally, concept‑based explanations in GANs, as illustrated in Ref ^[v3394], map latent directions to high‑level semantic concepts (e.g., “smile” or “age”). Saliency maps over these concept vectors provide an interpretable bridge between low‑level gradients and human‑understandable attributes, making it possible to audit how perturbations in latent space influence both the generated content and the model’s internal decision logic. Together, these techniques establish a rigorous framework for tracing perturbation influence through latent spaces, yielding saliency maps that are both faithful to the model and actionable for users.

Reduced pessimism and enhanced exploration compared to conventional robust MARL

reduced pessimism exploration robust MARL comparisongenerative noise model reduces worst-case assumption multi-agentpolicy exploration improved generative observation modelingrobust MARL pessimism mitigation generative approach

Conventional robust multi‑agent reinforcement learning (MARL) typically relies on pessimistic value estimates to guard against model misspecification, which often leads to overly conservative policies that under‑explore the state space. This pessimism can be especially pronounced in offline settings where the agent has no opportunity to collect new data, causing a “freezing” effect that limits discovery of high‑reward trajectories. Recent work has shown that explicitly incorporating pessimism into the learning objective—by penalizing out‑of‑distribution (OOD) state‑action pairs—can mitigate over‑estimation while still encouraging exploration of informative regions of the environment. ^[v7128]Offline MARL frameworks that adopt a pessimistic bias, such as the Off‑MMD algorithm, demonstrate that a carefully calibrated pessimism term can reduce variance in Q‑value estimates without sacrificing sample efficiency. These methods use a conservative Bellman backup that down‑weights uncertain transitions, thereby allowing the policy to focus exploration on states that are both reachable and informative. The result is a more robust policy that still achieves competitive performance on benchmark multi‑agent tasks. ^[v11265]Model‑based MARL approaches that explicitly hallucinate future trajectories, exemplified by H‑MARL, further reduce pessimism by learning a generative model of the environment. By planning over imagined rollouts, agents can evaluate the potential benefits of exploratory actions before committing real interactions, which lowers the risk of catastrophic failures while still encouraging exploration of novel states. This strategy has been shown to achieve near‑optimal sample complexity in zero‑sum Markov games, outperforming purely model‑free baselines that rely on conservative value estimates. ^[v10619]Distributionally robust Markov games (RMGs) introduce a worst‑case optimization criterion that can be combined with exploration bonuses to balance safety and discovery. Recent studies demonstrate that augmenting RMGs with an exploration term—derived from uncertainty estimates in the transition model—allows agents to systematically probe the boundaries of the uncertainty set, thereby reducing pessimism while maintaining robustness guarantees. This hybrid approach yields policies that perform well under model perturbations and still discover high‑reward strategies that would otherwise be missed by overly conservative algorithms. ^[v10345]In summary, reducing pessimism in robust MARL can be achieved through a combination of pessimistic value regularization, offline conservative learning, model‑based hallucination, and distributionally robust planning with exploration bonuses. These techniques collectively enable agents to explore more effectively while preserving safety and robustness, thereby outperforming conventional robust MARL methods that rely solely on pessimistic value estimates. ^[v15059]

1.4 Justification

The proposed AOI‑GBE methodology offers several decisive advantages over conventional robust MARL:

By fusing generative modeling, Bayesian inference, LLM‑driven curricula, cooperative resilience, and meta‑learning, AOI‑GBE transcends the conventional robustness paradigm, delivering a frontier solution that is both theoretically grounded and practically deployable in high‑stakes multi‑agent domains.

Appendix A: Validation References

Appendix: Cited Sources

1	Robust Multi-Agent Reinforcement Learning by Mutual Information Regularization 2023-10-14 https://doi.org/10.1109/TNNLS.2025.3577259 The work most similar to ours is ERNIE , which minimize the Lipshitz constant of value function under worst-case perturbations in MARL. However, the method considers all agents as potential adversaries, thus inherits the drawback of M3DDPG, learning policy that can either be pessimistic or insufficiently robust. Method Unlike current robust MARL approaches that prepares against every conceivable threat, human learns in routine scenarios, but can reliably reflect to all types of threats encounter...
2	The integration of autonomous decision-making frameworks within Web3 ecosystems represents a profound and transformative advancement in decentralized technologies. 2026-02-08 https://digitalfinancenews.com/research-reports/infrastructure-development-for-autonomous-decision-making-frameworks-in-web3-deagentais-role-and-implications/ As the number of agents and the complexity of their tasks increase, ensuring efficient computation for AI models (especially on-chain inference), secure decentralized off-chain computation, and effective coordination mechanisms becomes paramount. Solutions may involve specialized Layer 2 scaling solutions designed for agent-centric computation, parallel processing architectures, and advanced multi-agent reinforcement learning (MARL) techniques to optimize cooperative behaviors. Security and Robu...
3	Constrained Black-Box Attacks Against Multi-Agent Reinforcement Learning 2025-12-31 https://doi.org/10.48550/arxiv.2508.09275 In this paper, we investigate new vulnerabilities under more realistic and constrained conditions, assuming an adversary can only collect and perturb the observations of deployed agents.We also consider scenarios where the adversary has no access at all.We propose simple yet highly effective algorithms for generating adversarial perturbations designed to misalign how victim agents perceive their environment....
4	A Regularized Opponent Model with Maximum Entropy Objective 2019-07-31 https://doi.org/10.24963/ijcai.2019/85 In this work, we use the word "opponent" when referring to another agent in the environment irrespective of the environment's cooperative or adversarial nature. In our work, we reformulate the MARL problem into Bayesian inference and derive a multi-agent version of MEO, which we call the regularized opponent model with maximum entropy objective (ROMMEO). (2019)...
5	Image Compression And Decoding, Video Compression And Decoding: Methods And Systems 2026-03-25 https://ppubs.uspto.gov/pubwebapp/external.html?q=(20260089329).pn Note, during training the quantisation operation Q is not used, but we have to use it at inference time to obtain a strictly discrete latent. FIG. shows an example model architecture with side-information. The encoder network generates moments p and a together with the latent space y: the latent space is then normalised by these moments and trained against a normal prior distribution with mean zero and variance 1. When decoded, the latent space is denormalised using the same mean and variance. N...
6	MAESTRO: Multi-Agent Environment Shaping through Task and Reward Optimization 2025-12-31 https://doi.org/10.48550/arxiv.2511.19253 Adversarial and co-evolutionary approaches such as PAIRED and POET construct challenging environments that drive robust skill acquisition. In cooperative MARL, difficulty-aware curricula (e.g., cMALC-D ) adjust task parameters based on performance.In TSC, curricula typically perturb numeric parameters such as arrival rates or demand scales , which improves learning but captures only a narrow slice of real-world structure (e.g., complex rush-hour patterns or localized bottlenecks). MAESTRO extend...
7	Hierarchical Refinement of Universal Multimodal Attacks on Vision-Language Models 2026-01-14 https://doi.org/10.48550/arXiv.2601.10313 In the context of universal adversarial perturbation learning, where gradients are aggregated across the entire dataset, historical gradients may become misaligned with the current optimization direction, limiting attack effectiveness....
8	by Esben Kran, HaydnBelfield, Apart Research 2026-04-22 https://forum.effectivealtruism.org/posts/5h8bNTFHkrNNzrrJf/results-from-the-ai-testing-hackathon Curious to see more generality testing for the inverse scaling. See the dataset generation code, the graph plotting code, and the report. By Clement Dumas, Charbel-Raphael Segerie, Liam Imadache Abstract: Neural Trojans are one of the most common adversarial attacks out there. Even though they have been extensively studied in computer vision, they can also easily target LLMs and transformer based architecture. Researchers have designed multiple ways of poisoning datasets in order to create a bac...
9	Attackers Strike Back? Not Anymore - An Ensemble of RL Defenders Awakens for APT Detection 2025-08-25 https://doi.org/10.48550/arXiv.2508.19072 Adversarial reinforcement learning introduces a perturbation-generating agent that seeks to fool the defender agent. This setting is often modeled as a minimax game: , where π D is the defender's policy and π A is the attacker's. Multi-Agent and Ensemble RL Multi-agent reinforcement learning (MARL) extends single-agent RL to environments with multiple agents, which may be cooperative, competitive, or mixed....
10	Decentralized Multi-Agent Actor-Critic with Generative Inference 2019-10-06 https://arxiv.org/abs/1910.03058 Specifically, we use a modified context conditional generative adversarial network (CC-GAN) to infer missing joint observations given partial observations. The task of filling in partial observations by generative inference is similar to the image inpainting problem for a missing patch of pixels: with an arbitrary number of missing observations, we would like to infer the most likely observation of the other agents. We extend the popular MADDPG method as it appears most amenable to full decentra...
11	This paper demonstrates how reinforcement learning can explain two puzzling empirical patterns in household consumption behavior during economic downturns. 2026-04-21 https://www.bkaplowitz.com/publications As a first step towards model-free Bayes optimality, we introduce the Bayesian exploration network (BEN) which uses normalising flows to model both the aleatoric uncertainty (via density estimation) and epistemic uncertainty (via variational inference) in the Bellman operator. In the limit of complete optimisation, BEN learns true Bayes-optimal policies, but like in variational expectation-maximisation, partial optimisation renders our approach tractable. Empirical results demonstrate that BEN c...
12	LLM-TOC: LLM-Driven Theory-of-Mind Adversarial Curriculum for Multi-Agent Generalization 2026-03-07 https://doi.org/10.3390/math14050915 To address these limitations, we propose LLM-TOC (LLM-Driven Theory-of-Mind Adversarial Curriculum), which casts generalization as a bi-level Stackelberg game: in the inner loop, a MARL agent (the follower) minimizes regret against a fixed population, while in the outer loop, an LLM serves as a semantic oracle that generates executable adversarial or cooperative strategies in a Turing-complete code space to maximize the agent's regret. To cope with the absence of gradients in discrete code gener...
13	Learning Reward Functions for Cooperative Resilience in Multi-Agent Systems 2025-12-31 https://doi.org/10.48550/arxiv.2601.22292 In particular, in mixed-motive multi-agent systems, agents must do more than simply optimize individual performance, they must collectively adapt and recover from disruptions to preserve system-level well-being.Disruptions, whether internal (e.g., system failures), external (e.g., environmental shocks), or adversarial (e.g., targeted attacks), can compromise system performance, underscoring the need for adaptive recovery mechanisms .This motivates recent studies of resilience in multi-agent syst...
14	GH Research PLC: EXHIBIT 99.2 (EX-99.2) 2026-05-13 https://www.sec.gov/Archives/edgar/data/0001140361/0001140361-26-021079-index.htm In November 2025, we submitted a complete response to the clinical hold and in December 2025, the hold was lifted by the FDA. In parallel, we are conducting the Phase 1 healthy volunteer clinical pharmacology trial (GH001-HV-106) using our proprietary device in the United Kingdom. GH002 is our second mebufotenin product candidate, formulated for administration via a proprietary intravenous injection approach. We have completed a randomized, double-blind, placebo-controlled, dose-ranging clinical...