10. Multi‑Turn Contextual Memory Attacks

10.1 Identify the Objective

This chapter must provide a systematic synthesis of the state‑of‑the‑art on adversarial techniques that target the contextual memory of multi‑agent AI systems, focusing on how such attacks induce misaligned policy inference, erode trust in the system, and trigger cascading failures across interacting agents. The review should map existing attack and defense mechanisms to these three threat dimensions, critically assess coverage gaps, and conclude whether the objective can be achieved with current, publicly documented methods.

10.2 Survey of Existing Prior Art

#	Source	Core Contribution	Relevance to Objective
^[1]	DeepContext: Stateful Real‑Time Detection of Multi‑Turn Adversarial Intent Drift in LLMs	Recurrent intent tracking using lightweight turn‑level embeddings and an RNN to detect intent drift over turns	Detects the intent shift that underlies misaligned policy inference in multi‑turn dialogues
^[2]	DeepTrap: Automated Discovery of Contextual Vulnerabilities in OpenClaw	Optimises a black‑box trajectory‑level search to identify memory poisoning, RAG poisoning, and other contextual attacks	Provides a methodology for discovering memory‑based attacks that can mislead policy inference
^[3]	MINJA (Memory Injection Attack)	Demonstrates high‑success query‑only memory poisoning by bridging steps and progressive shortening techniques	Exemplifies persistent memory poisoning that can alter agent goals and trigger cascading failures
^[4]	AgentTrust: A Firewall for Agent Tool Calls	Wraps every tool call with a safety evaluation layer to classify actions before execution	Addresses trust degradation by preventing malicious tool invocations driven by poisoned memory
^[5]	Memory Poisoning Attack and Defense on Memory Based LLM‑Agents (various sub‑papers)	Introduces MINJA, AgentPoison, and systematic evaluation of memory‑poisoning attacks and defenses	Provides both attack (MINJA) and defense (AgentPoison) perspectives
^[6]	Memory Poisoning Attack and Defense on Memory Based LLM‑Agents (duplicate)	Same as above, with additional empirical results	Reinforces the feasibility of persistent memory attacks
^[7]	Memory Poisoning Attack and Defense on Memory Based LLM‑Agents	Discusses MINJA, AgentPoison, and cascading effects across multi‑agent systems	Highlights the cascade dimension of memory attacks
^[8]	Agent Traps (DeepMind study)	Characterises categories of memory‑based attacks (RAG poisoning, behaviour control, exfiltration)	Provides a taxonomy that maps to misaligned policy inference and cascading failures
^[9]	Into the Gray Zone: Domain Contexts Can Blur LLM Safety Boundaries	Presents JARGON, a multi‑turn strategy to inject hidden instructions via academic framing	Illustrates how contextual memory can be leveraged over multiple turns to subvert safety
^[10]	Every Picture Tells a Dangerous Story: Memory‑Augmented Multi‑Agent Jailbreak Attacks on VLMs	Extends memory‑poisoning to vision‑language models, showing cascading chain reactions	Demonstrates cross‑modal cascading failures
^[11]	Not a very smart home: crims could hijack smart‑home boiler…	Reports a real‑world memory‑poisoning attack that caused device takeover via calendar invites	Practical case of cascading failures in an IoT context
^[12]	Memory Poisoning Attack and Defense on Memory Based LLM‑Agents	Systematic empirical evaluation of memory poisoning attacks and defenses in EHR agents	Provides evidence of cascading failures in health‑care multi‑agent scenarios
^[13]	How May Explainable Artificial Intelligence Improve IT Security of Object Detection?	Discusses memory poisoning in agents that rely on RAG	Indicates cascading impact on downstream vision tasks
^[14]	On February 15, 2025, the UC Berkeley Center for Long‑Term Cybersecurity…	Outlines risk‑management playbook for autonomous agents, including cascading failure mitigation	Offers high‑level guidance for cascading failure scenarios
^[15]	Artificial intelligence systems are rapidly evolving…	Introduces Memory Ghost Attacks, a class of persistent contextual manipulation	Directly relevant to misaligned policy inference over extended interactions

The above works collectively cover: (i) attack techniques that poison contextual memory, (ii) detection frameworks that monitor intent drift, (iii) defense mechanisms that gate tool calls, and (iv) case studies illustrating cascading failures.

10.3 Best‑Fit Match

MINJA (Memory Injection Attack) – ^[3]

Capabilities and Mapping to Objective
| Objective Aspect | MINJA Feature | Source | |-------------------|---------------|--------| | Persistent memory poisoning across turns | Uses bridging steps and progressive shortening to inject malicious instructions that are retained in long‑term memory | ^[3] | | Misaligned policy inference | Poisoned memory causes the agent to adopt attacker‑defined goals, overriding the system prompt | ^[3] | | Trust degradation | By changing the agent’s internal policy, users misattribute errors to system failure rather than malicious manipulation | ^[3] | | Cascading failures | A single poisoned memory entry can propagate through multiple agents sharing the same memory store, leading to widespread unintended actions | ^[7]^[8] |

MINJA thus provides the most complete end‑to‑end illustration of how a multi‑turn contextual memory attack can lead to all three dimensions of threat.

10.4 Gap Analysis

Gap	Classification	Mitigation via Existing Prior Art?
1. *Detection of memory poisoning in interactive* multi‑agent workflows**	(i) closeable by composition – combine DeepContext ^[1] with AgentTrust ^[4] to monitor intent drift and tool calls simultaneously	Yes, but requires integration
2. Preventing cross‑agent memory contamination	(ii) requires new R&D – current defenses (AgentPoison, AgentTrust) assume isolated memory or shared memory with explicit boundaries	No, current tools do not enforce isolation across agents
3. Quantifying cascading failure impact across heterogeneous agents	(ii) not currently solved – existing case studies (e.g., smart‑home takeover, health‑care agent) are isolated; no systematic metrics	No
4. *Robustness against indirect* memory poisoning via RAG or external knowledge bases**	(i) closeable by integrating AgentPoison ^[3] with hybrid retrieval systems (e.g., Athena hybrid search)	Yes, with configuration
5. Dynamic runtime enforcement of policy consistency across turns	(ii) requires novel runtime enforcement layers	No

Thus, while attacks and some defenses exist, the full end‑to‑end mitigation path from memory poisoning to cascading failure analysis remains incomplete.

10.5 Verdict

Not Currently Possible

Closest Existing Fit	Coverage	Residual Gap
MINJA^[3]	Demonstrates persistent memory poisoning, misaligned policy inference, and cascading potential	Lacks automated detection and cross‑agent isolation
AgentPoison / AgentTrust^[4]^[3]	Provides memory‑attack detection and tool‑call gating	Do not address multi‑agent memory contamination or systematic cascade metrics
DeepContext^[1]	Detects intent drift over turns	Only monitors single‑agent intent; no mechanism to capture memory‑driven policy changes or inter‑agent propagation

The objective of comprehensively analyzing and mitigating multi‑turn contextual memory attacks that simultaneously misalign agent policy, erode trust, and cause cascading failures across a multi‑agent ecosystem cannot yet be fully satisfied with existing, published solutions.

Chapter Appendix: References

1	DeepContext: Stateful Real-Time Detection of Multi-Turn Adversarial Intent Drift in LLMs 2026-02-17 https://arxiv.org/abs/2602.16935 Modern LLMs may interleave global and sliding window attention to make LLMs scale more linearly , but this comes with a tradeoff in global awareness.Furthermore, these models still treat the concatenated block as a static snapshot, often failing to capture the temporal drift inherent in multi-turn grooming.Attempts to improve defenses include using a sliding window of conversation history , or employing lightweight embedding classifiers .However, both of these approaches come with limitations of...
2	Red-Teaming Agent Execution Contexts: Open-World Security Evaluation on OpenClaw 2026-05-13 https://arxiv.org/abs/2605.11047 Abstract: Agentic language-model systems increasingly rely on mutable execution contexts, including files, memory, tools, skills, and auxiliary artifacts, creating security risks beyond explicit user prompts. This paper presents DeepTrap, an automated framework for discovering contextual vulnerabilities in OpenClaw. DeepTrap formulates adversarial context manipulation as a black-box trajectory-level optimization problem that balances risk realization, benign-task preservation, and stealth. It co...
3	Memory Poisoning Attack and Defense on Memory Based LLM-Agents 2025-12-31 https://doi.org/10.48550/arxiv.2601.05504 This work addresses both gaps through a systematic empirical study of memory poisoning attacks and defenses in the context of EHR agents. Related work The security of LLM-based agents has become a critical research area, particularly regarding memory poisoning vulnerabilities. introduced MINJA (Memory Injection Attack), demonstrating that agents with persistent memory are vulnerable to query-only attacks achieving over 95% injection success rates through bridging steps, indication prompts, and p...
4	Runtime Safety, Alignment Gaps, and Elastic Context 2026-05-07 https://awesomeagents.ai/science/runtime-safety-alignment-gaps-elastic-context/ LongSeeker - Elastic context management for search agents achieves 61.5% on BrowseComp by teaching agents to actively reshape their working memory AgentTrust: A Firewall for Agent Tool Calls The scenario motivating AgentTrust is straightforward and slightly terrifying: your agent decides to run a destructive shell command, or gets tricked by a prompt injection into exfiltrating data, and by the time you notice, the action is already done. Chenglin Yang's paper proposes wrapping every agent tool ...
5	Home Blogs Runtime Attacks: Why Modern Mobile Pentesting Matters 2026-04-14 https://tmits.in/blog/mobile-pentesting-runtime-attacks/ Mobile application pentesting proven secrets will survive memory attacks against runtime attacks. Session Hijacking: Tokens Stolen During Checkout On-the-spot attacks seize OAuth tokens and biometric states as users hit the "Pay Now" button, while Man-in-the-Middle proxies capture HTTPS traffic after the SSL pinning breaks down. While many application security testing tools fail to detect the death of an active session. The TMITS run-time protection solution monitors session continuity, highligh...
6	Memory Poisoning Attack and Defense on Memory Based LLM-Agents 2026-01-08 https://doi.org/10.48550/arXiv.2601.05504 4] introduced MINJA (Memory Injection Attack), demonstrating that agents with persistent memory are vulnerable to query-only attacks achieving over 95% injection success rates through bridging steps, indication prompts, and progressive shortening techniques. Unlike traditional attacks, MINJA requires no elevated privileges, operating through regular user interactions. present AgentPoison, which targets RAG knowledge bases and memory stores but assumes stronger attacker capabilities with direct s...
7	Tipping the Dominos: Topology-Aware Multi-Hop Attacks on LLM-Based Multi-Agent Systems 2025-12-02 https://doi.org/10.48550/arXiv.2512.04129 A wide variety of attack methods have been developed to compromise LLM-based systems, including prompt injection , , , vision perturbation , , memory poisoning , knowledge base manipulation , and jailbreak attacks , . For example, Zhang et al. inject vision perturbation in OSWORLD agents to click on the adversarial pop-ups. Russinovich et al. use multi-turn jailbreak to attack various LLM-based systems....
8	Researchers at Google DeepMind have published a comprehensive study revealing that autonomous AI agents browsing the web are deeply vulnerable to a new class of attacks called "AI Agent Traps," whic 2026-04-17 https://www.cryptika.com/google-deepmind-researchers-warn-hackers-can-hijack-ai-agents-through-malicious-web-content/ These traps can also wrap malicious instructions inside "educational" or "red-teaming" framing to bypass safety filters, a tactic confirmed across multiple large-scale jailbreak datasets. Cognitive State Traps target an agent's long-term memory and knowledge bases. RAG Knowledge Poisoning, for instance, injects fabricated statements into retrieval corpora so that agents treat attacker-controlled content as verified fact. Research cited in the paper demonstrated that poisoning as few as a handful...
9	Into the Gray Zone: Domain Contexts Can Blur LLM Safety Boundaries 2026-04-16 https://arxiv.org/abs/2604.15717 Building on our observation of General Unlocking, we design JARGON to systematically exploit this vulnerability through adversarial multi-turn interactions. As illustrated in Figure 3, JARGON operates in three stages: (1) establishing a safety-research context to create a permissive environment, (2) building rapport through benign academic discussion, and (3) extracting harmful knowledge through contextually reframed queries. Control Layer The attacker in the control layer maintains awareness of...
10	Every Picture Tells a Dangerous Story: Memory-Augmented Multi-Agent Jailbreak Attacks on VLMs 2026-04-13 https://arxiv.org/abs/2604.12616 Every Picture Tells a Dangerous Story: Memory-Augmented Multi-Agent Jailbreak Attacks on VLMs 14 Apr 2026C4D0CE67718797FAEB556A07152571D6arXiv:2604.12616v1VLMAgentMemoryJailbreak Attack Memory Update & Guidance...
11	Not a very smart home: crims could hijack smart-home boiler, open and close powered windows and more. 2026-01-13 https://www.theregister.com/2025/08/08/infosec_hounds_spot_prompt_injection/ Black hat A trio of researchers has disclosed a major prompt injection vulnerability in Google's Gemini large language model-powered applications. This allows for attacks ranging from "permanent memory poisoning" to unwanted video streaming, email exfiltration, and even taking over the target's smart home systems to plunge them into darkness or open a powered window, all triggered by nothing more than a simple Google Calendar invitation or email. "You used to believe that adversarial attacks aga...
12	Memory Poisoning Attack and Defense on Memory Based LLM-Agents 2026-01-08 https://arxiv.org/abs/2601.05504 This work addresses these gaps through systematic empirical evaluation of memory poisoning attacks and defenses in Electronic Health Record (EHR) agents....
13	How May Explainable Artificial Intelligence Improve IT Security of Object Detction? 2026-04-15 https://net.cs.uni-bonn.de/teaching/student-theses/master-diploma-from-2009/ ... automatic target annotation to effectively guide fuzzing campaigns Kizilkaya, Bilal Kutzner, Joris Lohr, Marvin Romer, Leon Reinforcement learning in ad-hoc networks with authentication mechanisms based on key-insulated signatures Scheffczyk, Jan Introducing the component coreference resolution task for requirement specification Scherer, Paul Stavrou, Ioannis Suleman, Sherwan Funktionsweise und Verbreitung von TLS 1.3 An Investigation of 5G non-stand-alone Vulnerabilities Wilms, Leo How May ...
14	On February 15, 2025, the UC Berkeley Center for Long-Term Cybersecurity (CLTC) unveiled what it calls the first comprehensive risk-management profile for autonomous AI agents - a playbook designed f 2026-04-22 https://innovirtuoso.com/cybersecurity/uc-berkeleys-first-risk-management-framework-for-autonomous-ai-agents-the-2025-security-and-governance-blueprint/ Goal: Prepare for agent-specific failures and adversarial behavior. Playbook essentials: - Rapid containment: Kill switches, token revocation, session isolation, and circuit breakers for high-risk tools. - Forensic workflows: Preserve decision logs, prompts, outputs, and tool invocation metadata. - Root cause analysis: Was it hallucination, prompt injection, tool confusion, or credential misuse? - Recovery and notification: Data restoration plans, stakeholder communications, and (where applicabl...
15	Artificial intelligence systems are rapidly evolving from simple prompt-response tools into persistent cognitive environments that retain contextual memory across interactions. 2026-04-17 https://www.tdcommons.org/dpubs_series/9572/ Artificial intelligence systems are rapidly evolving from simple prompt-response tools into persistent cognitive environments that retain contextual memory across interactions. Modern AI assistants, enterprise copilots, cybersecurity analysis systems, and retrieval-augmented architectures increasingly store fragments of prior conversations, retrieved documents, and contextual reasoning signals to improve decision continuity. While this capability significantly improves usability and system intel...