Agent Memory and Context How Agents Forget and Ways to Fix It


Your Weekly AI Briefing for Leaders

Welcome to your weekly AI Tech Circle briefing - What you need to know about Generative AI, without the noise!

I'm building and implementing AI solutions, and sharing everything I learn along the way...

Feeling overwhelmed by the constant stream of AI news. I've got you covered! I filter it all so you can focus on what's important.

Today at a Glance:

  • When Agents Forget How to Build Context-Aware Memory
  • Generative AI Use Case: AI-enabled Sales Assistant for an E-commerce store
  • AI Weekly news and updates covering newly released LLMs
  • Courses and events to attend

EmbeddingGemma: Google's New Open-Source On-Device Embedding Model

Google DeepMind released EmbeddingGemma, allowing a 308M-parameter multilingual text embedding model designed for on-device use. It runs efficiently (under 200 MB RAM when quantized), supports over 100 languages, and maintains state-of-the-art performance on several embedding benchmarks despite its small size. It features Matryoshka representation learning so developers can use reduced embedding dimensions (e.g., 768 → 128) for speed or storage trade-offs. EmbeddingGemma integrates with tools like SentenceTransformers, LangChain, Ollama, Weaviate, and works offline with Gemma 3n for mobile RAG pipelines.

Why It Matters:

What's EmbeddingGemma changes what’s possible. For companies constrained by device size, network limitations, or privacy regulations, this model offers robust semanCloud without requiring Cloud or large models. It lets teams embed search, recommendation, and retrieval workflows directly on users' devices, thereby accelerating response times, reducing latency, and keeping data local.

Because embedding quality is foundational for things like RAG systems or recommendation engines, having a compact, high-accuracy model means fewer downstream mistakes.

Agent Memory & Context: Why Agents Forget & How to Fix It

It is challenging to think about what to write every week. Usually, I go for what I have faced during the week, or maybe I have done something. Over the last few weeks, I have been so much into Claude code to complete my project of Gen AI maturity framework (progress in the next section), it is keeping me busy until I hit my limit of Claude code every day with the smallest monthly package I have subscribed to, and that is usually 4-5 hours.

This week, I hit a block. What to write as a main topic?

It brought me back to what I was doing with VIBE Coding, so let me share with you a few areas.

The Problem We See in VIBE Coding

'"While building the Gen AI Maturity Portal using Claude Code (VIBE-coded), I noticed agents often “reset', losing track of features already implemented, duplicating work, and failing to build upon past progress. At times, the agent coded parts from scratch that I had already built. This isn't just annoying; it wastes time and erodes trust in using agents.

What Research & Practice Say

  • From Claude Code: Best Practices for Agentic Coding, Anthropic recommends using hierarchical memory locations: Enterprise-level memory (company-wide policies), project memory (shared architecture/design), and user memory (style preferences, shortcuts) stored in CLAUDE.md files that agents load automatically. This makes context persistent across sessions.
  • Agent Memory: How to Build Agents that Learn and Remember" highlights memory types: message buffers (recent interactions), long-term memory blocks, and external databases to persist key info.
  • IBM describes how memory helps agents improve decision-making, perception, and adapt over time rather than treating every interaction as brand new.

How I've Applied This in VIBE Coding

  • I've structured the VIBE project so that project memory files exist (CLAUDE.md) to capture shared design patterns, completed features, and coding styles.
  • I utilize short-term session memory during each sprint: after working on one feature, I prompt Claude Code to summarize what was done and what remains, which is then stored.
  • For feature handovers, I manually check whether the agent's memory already includes similar code or existing modules to avoid duplication, which helps prevent the agent from repeatedly building things from scratch.

"The most essential task I learned is to "maintain versioned documentation of what agents do, and on which documents it is baselining, from task breakdown, to feature explaining, to logging each task status so you can avoid rework when they forget".

What I still need to figure out in the broader other areas to work in the coming weeks.

  • Anthropic's multi-agent research where memory handoffs improve collaboration.
  • OpenAI's gpt-oss Harmony format standardizes role-based memory usage.
  • LangChain's memory modules allow summary buffers and retrieval augmentation.

Call to Action:

If you are experimenting with GenAI but struggling to scale beyond pilots, now is the right time to evaluate your maturity level.

Visit the GenAI Maturity Portal (GenAIMaturity.net), which I've been VIBE-coding live using Claude Code, and run the self-assessment to see where your organization stands:

  • Level 0–1 (Aware / Exploring): Start with simple session memory and logging. Capture what agents do so you can iterate quickly.
  • Level 2–3 (Operational / Integrated): Add persistent memory through vector stores or project memory files. Build handover checkpoints and memory pruning strategies.
  • Level 4+ (Autonomous / Transformative): Move to multi-agent memory sharing, continuous context caching, and automated learning loops.

Try the portal, experiment with memory strategies, and share what works (or breaks!). Your feedback helps refine the model.

Gen AI Maturity Framework:

A few more updates have been made to GenAIMaturity.Net, and this week, I have added the Gen AI Implementation Toolkit covering several areas. This entire portal is vibe-coded, and content is being reviewed and added frequently.

Top Stories of the Week: K2 Thinkthe, such as AIME' 25'25, is a new open-source reasoning model built jointly by the Institute of Foundation Models at Mohamed bin Zayed University of Artificial Intelligence (MBZUAI) and G42, launching with just 32 billion parameters yet achieving performance on par with much larger and more resource-intensive flagship reasoning models. It delivers state-of-the-art results in math benchmarks, including AIME '24/’25, HMMT ’25, and OMNI-Math-HARD. K2 Think follows UAE's earlier models, such as (Arabic), NANDA (Hindi), and SHERKALA (Kazakh), expanding its portfolio of efficient, multilingual AI tools while building on the reproducible foundation laid by K2-65B, released in 2024.

Why It Matters: This development matters because it challenges the common assumption that only huge models (hundreds of billions of parameters) can deliver high reasoning performance. By achieving comparable results with fewer parameters, K2 Think offers a path toward more efficient, accessible, and sustainable AI. For businesses and researchers, this means lower cost of deployment, smaller infrastructure needs, and faster iteration.

The Cloud: the backbone of the AI revolution

  • OCI's MLPerf Inference 5.0 benchmark results showcase exceptional performance source
  • Reaching Across the Isles: UK-LLM Brings AI to UK Languages With NVIDIA Nemotron source

Generative AI Use Case of the Week:

Several Generative AI use cases are documented, and you can access the library of generative AI Use cases. Link

AI-enabled Sales Assistant for an E-commerce Storefront

A generative AI assistant sits inside the storefront and guides shoppers from discovery to checkout. It understands natural language, recommends products, compares options, configures bundles, checks stock and delivery windows, and completes the order. The assistant answers policy questions and hands off to a human when needed. Leading retailers report adoption of conversational shopping and early gains from assistants.

Business Challenges:

Catalogs are large and complex to search. Filters confuse many visitors. Live chat teams are costly and cannot scale at peak times. Fragmented content produces inconsistent answers. Language coverage is limited. Slow guidance leads to cart abandonment. Survey data shows most shoppers now use AI to help them shop, which raises expectations for conversational help.

AI Solution:

  • The assistant interprets shopper goals and retrieves facts from the product catalog, attributes, pricing, inventory, reviews, and policies. It cites the source for every claim.
  • It suggests comparable items, explains trade-offs, and builds bundles or subscriptions. It adapts to constraints such as size, budget, and delivery date.
  • It uses session signals and past orders to tailor results. It respects privacy settings and logs all use of personal data.

Impact:

  • Revenue: Higher conversion and basket size through guided recommendations and faster answers.
  • User Experience: Shoppers get clear help in natural language with accurate information and live citations. Surveys show most shoppers now expect conversational support.
  • Operations: Fewer repetitive chats reach human agents, which frees them for complex cases. Klarna reports that AI handles most chats with very short resolution times.
  • Process: Standard responses and linked sources improve consistency and audit. Prebuilt agent patterns speed integration with existing retail systems
  • Cost: Lower handling time per question and fewer abandoned carts reduce support and acquisition waste.

Data Sources:

  • Product catalog and attributes.
  • Pricing and promotions.
  • Inventory and fulfillment data.
  • Images and rich media.
  • Reviews and rating summaries.
  • Policies for shipping returns and warranties.
  • Order history and session events with consent.
  • Search logs and clickstream for relevance tuning.

Strategic Fit:

The assistant converts existing content and data into real-time guidance that meets the rising expectations of shoppers. It protects brand trust by citing sources and deferring to experts when uncertain.

Favorite Tip Of The Week:

''We have to stop it taking over'

Geoffrey Hinton, the 'Godfather of AI,' discusses the past, present, and future of AI, including whether AI will ever be more intelligent than humans and whether we should do more to protect against the risks of superintelligent AI.

video preview

Potential of AI:

Albania has appointed Diellaworld's"""", a virtual assistant powered by AI, as the world’s first AI-generated “minister” tasked with managing public procurement to fight corruption. Diella was launched in early 2025 as part of the e-Albania platform, helping citizens access online public services and issue hundreds of digital documents. Under this new role, Diella will gradually assume authority over public tenders, promising transparency, objectivity, and a tendering process “100 percent free of corruption.” Diella's appointment reflects a bold step in redefining how government institutions can deploy AI for governance functions. The AI minister model raises legal, ethical, and operational questions, such as oversight, transparency of decisions, and how to ensure the system itself remains resistant to manipulation.

Why It Matters: demonstrates the transition from Albania. It shows how AI can move beyond assisting roles to assuming decision-making authority in public governance. For countries or organizations considering AI for oversight or regulatory functions, Albania’s experiment offers a real-world example of what's possible and what to watch out for.

Things to Know...

Security Challenges in AI Agent Deployment: Insights from a Large-Scale Public Competition
A research team from Gray Swan AI and the UK AI Security Institute has worked on creating an Agent Red Teaming (ART) benchmark. A large-scale public red-teaming competition was conducted to stress-test 22 frontier AI agents across 44 realistic deployment scenarios, generating over 1.8 million prompt-injection attempts. More than 60,000 of these attacks were successful, resulting in policy violations such as unauthorized data access, financial misconduct, and compliance breaches, with most agents failing within just 10–100 queries. The study revealed that attacks were highly transferable, often succeeding across different agents and tasks, which underscores the systemic nature of these vulnerabilities. Interestingly, model size, compute power, or capability level were not reliable indicators of robustness; larger, more capable models were not inherently safer. To support the community, the authors introduced the ART benchmark (Agent Red Teaming benchmark) and an evaluation framework that enables standardized and repeatable testing of AI agents under adversarial conditions.

Why It Matters: This research highlights the urgent need for organizations to prioritize security before deploying AI agents in real-world environments. Since policy violations can occur quickly and attack patterns often work across models, defenses must be comprehensive and not limited to a single model type or vendor. The fact that larger models are not necessarily more secure should caution teams against relying solely on model sophistication as a safety measure. The availability of a standardized benchmark like ART provides a valuable tool for developers, researchers, and enterprises to test vulnerabilities early and build stronger guardrails.


Checking the current AI capabilities in an Organization:

Before launching new AI initiatives, organizations should start by taking a clear inventory of their current AI capabilities. This includes identifying where AI is already in use, determining which workflows rely on automation, and identifying gaps in data readiness, infrastructure, and team skills.

Once this baseline is established, leaders can create a phased roadmap to expand AI adoption. The next step should be to select a few high-impact areas for pilots, set measurable goals for those pilots, and use the results to inform a broader rollout. This structured approach avoids wasted investment, ensures alignment with business objectives, and builds confidence across teams as they see early wins.

Quick Self-Assessment Checklist:

  • Do we have a current inventory of AI projects, tools, and workflows?

Are our data sources, accessCloud, and secure for AI use?

  • Do we have the infrastructure (Cloud, compute, APIs) to scale AI solutions?
  • Have we identified at least two high-impact use cases for the next phase?
  • Are there clear KPIs to measure success and guide future investment?

For a detailed assessment, follow the Generative AI Maturity Assessment

The Opportunity...

Podcast:

  • This week's Open Tech Talks episode 164 is "AI for Automation to Transform Business Operations with Aarti Anand". Aarti was a product leader for 15 years and one day, decided to let it all go and start Kodenyx AI.

Apple | Amazon Music

show
AI for Automation to Transfo...
Aug 30 · OPEN Tech Talks: AI wort...
30:14
Spotify Logo
 

Courses to attend:

  • Building Towards Computer Use with Anthropic by DeepLearning AI. Throughout this course, you’ll explore the features that pave the way for computer use, from working with Anthropic's API to multimodal prompting, prompt caching, and tool use, culminating in a demo that brings all these features together to create an AI assistant that relies on a computer.
  • AI Agents Course from Hugging Face. This free course will take you on a journey, from beginner to expert, in understanding, using, and building AI agents.

Events:


Tech and Tools...

  • MCP registry provides MCP clients with a list of MCP servers, like an app store for MCP servers.
  • Genkit is an open-source framework for building full-stack AI-powered applications, built and used in production by Google's Firebase

The Investment in AI...

  • TENEX.AI, the AI-native cybersecurity company transforming security operations, announced its $27 million Series A funding. It offers a Managed Detection and Response (MDR) service that combines advanced agentic AI, automation, and expert human skills to provide faster detection, high-quality triage, and autonomous responses with human oversight.
  • LightSpun, an AI-powered dental insurance administration platform, has raised $13 million in Series A funding.

That's it for this week - thanks for reading!

Reply with your thoughts or favorite section.

Found it useful? Share it with a friend or colleague to grow the AI circle.

Until next Saturday,

Kashif


The opinions expressed here are solely my conjecture based on experience, practice, and observation. They do not represent the thoughts, intentions, plans, or strategies of my current or previous employers or their clients/customers. The objective of this newsletter is to share and learn with the community.

Dubai, UAE

You are receiving this because you signed up for the AI Tech Circle newsletter or Open Tech Talks. If you'd like to stop receiving all emails, click here. Unsubscribe · Preferences

AI Tech Circle

Learn something new every Saturday about #AI #ML #DataScience #Cloud and #Tech with Weekly Newsletter. Join with 278+ AI Enthusiasts!

Read more from AI Tech Circle

Your Weekly AI Briefing for Leaders Welcome to your weekly AI Tech Circle briefing - What you need to know about Generative AI, without the noise! I'm building and implementing AI solutions, and sharing everything I learn along the way... Feeling overwhelmed by the constant stream of AI news? I've got you covered! I filter it all so you can focus on what's important. Today at a Glance: Early Gen AI projects are failing Generative AI Use Case: Transform Customer Feedback into Product Insights...

Your Weekly AI Briefing for Leaders Welcome to your weekly AI Tech Circle briefing - highlighting what matters in Generative AI for business! I'm building and implementing AI solutions, and sharing everything I learn along the way... Feeling overwhelmed by the constant stream of AI news? I've got you covered! I filter it all so you can focus on what's important. Today at a Glance: NIST Risk Management Framework 101 Generative AI Use Cases AI Weekly news and updates covering newly released...

Your Weekly AI Briefing for Leaders Welcome to your weekly AI Tech Circle briefing - highlighting what matters in Generative AI for business! I'm building and implementing AI solutions, and sharing everything I learn along the way... Feeling overwhelmed by the constant stream of AI news? I've got you covered! I filter it all so you can focus on what's important. Today at a Glance: NIST Risk Management Framework 101 Generative AI Use Cases AI Weekly news and updates covering newly released...