Memp: A Task-Agnostic Framework that Elevates Procedural Memory to a Core Optimization Target in LLM-based Agent

August 19, 2025

3

LLM agents have become powerful enough to handle complex tasks, ranging from web research and report generation to data analysis and multi-step software workflows. However, they struggle with procedural memory, which is often rigid, manually designed, or locked inside model weights today. This makes them fragile: unexpected events like network failures or UI changes can force a complete restart. Unlike humans, who learn by reusing past experiences as routines, current LLM agents lack a systematic way to build, refine, and reuse procedural skills. Existing frameworks offer abstractions but leave the optimization of memory life-cycles largely unresolved.

Memory plays a crucial role in language agents, allowing them to recall past interactions across short-term, episodic, and long-term contexts. While current systems use methods like vector embeddings, semantic search, and hierarchical structures to store and retrieve information, effectively managing memory, especially procedural memory, remains a challenge. Procedural memory helps agents internalize and automate recurring tasks, yet strategies for constructing, updating, and reusing it are underexplored. Similarly, agents learn from experience through reinforcement learning, imitation, or replay, but face issues like low efficiency, poor generalization, and forgetting.

Researchers from Zhejiang University and Alibaba Group introduce Memp, a framework designed to give agents a lifelong, adaptable procedural memory. Memp transforms past trajectories into both detailed step-level instructions and higher-level scripts, while offering strategies for memory construction, retrieval, and updating. Unlike static approaches, it continuously refines knowledge through addition, validation, reflection, and discarding, ensuring relevance and efficiency. Tested on ALFWorld and TravelPlanner, Memp consistently improved accuracy, reduced unnecessary exploration, and optimized token use. Notably, memory built from stronger models transferred effectively to weaker ones, boosting their performance. This shows Memp enables agents to learn, adapt, and generalize across tasks.

When an agent interacts with its environment executing actions, using tools, and refining behavior across multiple steps, it’s a Markov Decision Process. Each step generates states, actions, and feedback, forming trajectories that also yield rewards based on success. However, solving new tasks in unfamiliar environments often results in wasted steps and tokens, as the agent repeats exploratory actions already performed in earlier tasks. Inspired by human procedural memory, the proposed framework equips agents with a memory module that stores, retrieves, and updates procedural knowledge. This enables agents to reuse past experiences, cutting down redundant trials and improving efficiency in complex tasks.

Experiments on TravelPlanner and ALFWorld demonstrate that storing trajectories as either detailed steps or abstract scripts boosts accuracy and reduces exploration time. Retrieval strategies based on semantic similarity further refine memory use. At the same time, dynamic update mechanisms such as validation, adjustment, and reflection allow agents to correct errors, discard outdated knowledge, and continuously refine skills. Results show that procedural memory not only improves task completion rates and efficiency but also transfers effectively from stronger to weaker models, giving smaller systems significant performance gains. Moreover, scaling retrieval improves outcomes up to a point, after which excessive memory can overwhelm the context and reduce effectiveness. This highlights procedural memory as a powerful way to make agents more adaptive, efficient, and human-like in their learning.

In conclusion, Memp is a task-agnostic framework that treats procedural memory as a central element for optimizing LLM-based agents. By systematically designing strategies for memory construction, retrieval, and updating, Memp allows agents to distill, refine, and reuse past experiences, improving efficiency and accuracy in long-horizon tasks like TravelPlanner and ALFWorld. Unlike static or manually engineered memories, Memp evolves dynamically, continuously updating and discarding outdated knowledge. Results show steady performance gains, efficient learning, and even transferable benefits when migrating memory from stronger to weaker models. Looking ahead, richer retrieval methods and self-assessment mechanisms can further strengthen agents’ adaptability in real-world scenarios.

Check out the Technical Paper. Feel free to check out our GitHub Page for Tutorials, Codes and Notebooks. Also, feel free to follow us on Twitter and don’t forget to join our 100k+ ML SubReddit and Subscribe to our Newsletter.

Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.

Source link

Memp: A Task-Agnostic Framework that Elevates Procedural Memory to a Core Optimization Target in LLM-based Agent

NVIDIA AI Releases Nemotron Nano 2 AI Models: A Production-Ready Enterprise AI Model Family and 6x Faster than Similar Sized Model

Meta’s AI Policy Just Crossed a Line

The Secret to Scaling LLM-Based Products: Plugin Architectures Over Monoliths

LEAVE A REPLY Cancel reply

Most Popular

Executive Assistant & Governance Coordinator (Maternity leave contract)

Hilton Surpass vs. Hilton Aspire: Credit card showdown

Colts Name Daniel Jones The Starting Quarterback For Week One

We Tested the Best Fitness Trackers of 2025

Recent Comments

EDITOR PICKS

Executive Assistant & Governance Coordinator (Maternity leave contract)

Hilton Surpass vs. Hilton Aspire: Credit card showdown

Colts Name Daniel Jones The Starting Quarterback For Week One

POPULAR POSTS

Executive Assistant & Governance Coordinator (Maternity leave contract)

Hilton Surpass vs. Hilton Aspire: Credit card showdown

Colts Name Daniel Jones The Starting Quarterback For Week One

POPULAR CATEGORY

ABOUT US

FOLLOW US