Understanding A2A with Heiko Hotz and Sokratis Kartakis – O’Reilly

By Mr Hossain

August 21, 2025

0

14

Generative AI in the Real World

Generative AI in the Real World: Understanding A2A with Heiko Hotz and Sokratis Kartakis

00:00
/
33m 10s

Everyone is talking about agents: single agents and, increasingly, multi-agent systems. What kind of applications will we build with agents, and how will we build with them? How will agents communicate with each other effectively? Why do we need a protocol like A2A to specify how they communicate? Join Ben Lorica as he talks with Heiko Hotz and Sokratis Kartakis about A2A and our agentic future.

About the Generative AI in the Real World podcast: In 2023, ChatGPT put AI on everyone’s agenda. In 2025, the challenge will be turning those agendas into reality. In Generative AI in the Real World, Ben Lorica interviews leaders who are building with AI. Learn from their experience to help put AI to work in your enterprise.

Check out other episodes of this podcast on the O’Reilly learning platform.

Timestamps

0:00: Intro to Heiko and Sokratis.
0:24: It feels like we’re in a Cambrian explosion of frameworks. Why agent-to-agent communication? Some people might think we should focus on single-agent tooling first.
0:53: Many developers start developing agents with completely different frameworks. At some point they want to link the agents together. One way is to change the code of your application. But it would be easier if you could get the agents talking the same language.
1:43: Was A2A something developers approached you for?
1:53: It is fair to say that A2A is a forward-looking protocol. We see a future where one team develops an agent that does something and another team in the same organization or even outside would like to leverage that capability. An agent is very different from an API. In the past, this was done via API. With agents, I need a stateful protocol where I send a task and the agent can run asynchronously in the background and do what it needs to do. That’s the justification for the A2A protocol. No one has explicitly asked for this, but we will be there in a few months time.
3:55: For developers in this space, the most familiar is MCP, which is a single agent protocol focused on external tool integration. What is the relationship between MCP and A2A?
4:26: We believe that MCP and A2A will be complementary and not rivals. MCP is specific to tools, and A2A connects agents with each other. That brings us to the question of when to wrap a functionality in a tool versus an agent. If we look at the technical implementation, that gives us some hints when to use each. An MCP tool exposes its capability by a structured schema: I need input A and B and I give you the sum. I can’t deviate from the schema. It’s also a single interaction. If I wrap the same functionality into an agent, the way I expose the functionality is different. A2A expects a natural language description of the agent’s functionality: “The agent adds two numbers.” Also, A2A is stateful. I send a request and get a result. That gives developers a hint on when to use an agent and when to use a tool. I like to use the analogy of a vending machine versus a concierge. I put money into a vending machine and push a button and get something out. I talk to a concierge and say, “I’m thirsty; buy me something to drink.”
7:09: Maybe we can help our listeners make the notion of A2A even more concrete. I tell nonexperts that you’re already using an agent to some extent. Deep research is an agent. I talk to people building AI tools in finance, and I have a notion that I want to research, but I have one agent looking at earnings, another looking at other data. Do you have a canonical example you use?
8:13: We can parallelize A2A with real business. Imagine separate agents that are different employees with different skills. They have their own business cards. They share the business cards with the clients. The client can understand what tasks they want to do: learn about stocks, learn about investments. So I call the right agent or server to get a specialized answer back. Each agent has a business card that describes its skills and capabilities. I can talk to the agent with live streaming or send it messages. You need to define how you communicate with the agent. And you need to define the security method you will use to exchange messages.
9:45: Late last year, people started talking about single agents. But people were already talking about what the agent stack would be: memory, storage, observability, and so on. Now that you are talking about multi-agents or A2A, are there important things that need to be introduced to the agentic stack?
10:32: You would still have the same. You’d arguably need more. Statefulness, memory, access to tools.
10:48: Is that going to be like a shared memory across agents?
10:52: It all depends on the architecture. The way I imagine a vanilla architecture, the user speaks to a router agent, which is the primary contact of the user with the system. That router agent does very simple things like saying “hello.” But once the user asks the system “Book me a holiday to Paris,” there are many steps involved. (No agent can do this yet). The capabilities are getting better and better. But the way I imagine it is that the router agent is the boss, and two or three remote agents do different things. One finds flights; one books hotels; one books cars—they all need information from each other. The router agent would hold the context for all of those. If you build it all within one agentic framework, it becomes even easier because those frameworks have the concepts of shared memory built in. But it’s not necessarily needed. If the hotel booking agent is built in LangChain and from a different team than the flight booking agent, the router agent would decide what information is needed.
13:28: What you just said is the argument for why you need these protocols. Your example is the canonical simple example. What if my trip involves four different countries? I might need a hotel agent for every country. Because hotels might need to be specialized for local knowledge.
14:12: Technically, you might not need to change agents. You need to change the data—what agent has access to what data.
14:29: We need to parallelize single agents with multi-agent systems; we move from a monolithic application to microservices that have small, dedicated agents to perform specific tasks. This has many benefits. It also makes the life of the developer easier because you can test, you can evaluate, you can perform checks before moving to production. Imagine that you gave a human 100 tools to perform a task. The human will get confused. It’s the same for agents. You need small agents with specific terms to perform the right task.
15:31: Heiko’s example drives home why something like MCP may not be enough. If you have a master agent and all it does is integrate with external sites, but the integration is not smart—if the other side has an agent, that agent could be thinking as well. While agent-to-agent is something of a science fiction at the moment, it does make sense moving forward.
16:11: Coming back to Sokratis’s thought, when you give an agent too many tools and make it try to do too many things, it just becomes more and more likely that by reasoning through these tools, it will pick the wrong tool. That gets us to evaluation and fault tolerance.
16:52: At some point we might see multiagent systems communicate with other multiagent systems—an agent mesh.
17:05: In the scenario of this hotel booking, each of the smaller agents would use their own local model. They wouldn’t all rely on a central model. Almost all frameworks allow you to choose the right model for the right task. If a task is simple but still requires an LLM, a small open source model could be sufficient. If the task requires heavy “brain” power, you might want to use Gemini 2.5 Pro.
18:07: Sokratis brought up the word security. One of the earlier attacks against MCP is a scenario when an attacker buries instructions in the system prompt of the MCP server or its metadata, which then gets sent into the model. In this case, you have smaller agents, but something may happen to the smaller agents. What attack scenarios worry you at this point?
19:02: There are many levels at which something might go wrong. With a single agent, you have to implement guardrails before and after each call to an LLM or agent.
19:24: In a single agent, there is one model. Now each agent is using its own model.
19:35: And this makes the evaluation and security guardrails even more problematic. From A2A’s side, it supports all the different security types to authenticate agents, like API keys, HTTP authentication, OAuth 2. Within the agent card, the agent can define what you need to use to use the agent. Then you need to think of this as a service possibility. It’s not just a responsibility of the protocol. It’s the responsibility of the developer.
20:29: It’s equivalent to right now with MCP. There are thousands of MCP servers. How do I know which to trust? But at the same time, there are thousands of Python packages. I have to figure out which to trust. At some level, some vetting needs to be done before you trust another agent. Is that right?
21:00: I would think so. There’s a great article: “The S in MCP Stands for Security.” We can’t speak as much to the MCP protocol, but I do believe there have been efforts to implement authentication methods and address security concerns, because this is the number one question enterprises will ask. Without proper authentication and security, you will not have adoption in enterprises, which means you will not have adoption at all. WIth A2A, these concerns were addressed head-on because the A2A team understood that to get any chance of traction, built in security was priority 0.
22:25: Are you familiar with the buzzword “large action models”? The notion that your model is now multimodal and can look at screens and environment states.
22:51: Within DeepMind, we have Project Mariner, which leverages Gemini’s capabilities to ask on your behalf about your computer screen.
23:06: It makes sense that it’s something you want to avoid if you can. If you can do things in a headless way, why do you want to pretend you’re human? If there’s an API or integration, you would go for that. But the reality is that many tools knowledge workers use may not have these features yet. How does that impact how we build agent security? Now that people might start building agents to act like knowledge workers using screens?
23:45: I spoke with a bank in the UK yesterday, and they were very clear that they need to have complete observability on agents, even if that means slowing down the process. Because of regulation, they need to be able to explain every request that went to the LLM, and every action that followed from that. I believe observability is the key in this setup, where you just cannot tolerate any errors. Because it is LLM-based, there will still be errors. But in a bank you must at least be in a position to explain exactly what happened.
24:45: With most customers, whenever there’s an agentic solution, they need to share that they are using an agentic solution and the way [they] are using it is X, Y, and Z. A legal agreement is required to use the agent. The customer needs to be clear about this. There are other scenarios like UI testing where, as a developer, I want an agent to start using my machine. Or an elder who is connected with customer support of a telco to fix a router. This is impossible for a nontechnical person to achieve. The fear is there, like nuclear energy, which can be used in two different ways. It’s the same with agents and GenAI.
26:08: A2A is a protocol. As a protocol, there’s only so much you can do on the security front. At some level, that’s the responsibility of the developers. I may want to signal that my agent is secure because I’ve hired a third party to do penetration testing. Is there a way for the protocol to embed knowledge about the extra step?
27:00: A protocol can’t handle all the different cases. That’s why A2A created the notion of extensions. You can extend the data structure and also the methods or the profile. Within this profile, you can say, “I want all the agents to use this encryption.” And with that, you can tell all your systems to use the same patterns. You create the extension once, you adopt that for all the A2A compatible agents, and it’s ready.
27:51: For our listeners who haven’t opened the protocol, how easy is it? Is it like REST or RPC?
28:05: I personally learned it within half a day. For someone who is familiar with RPC, with traditional internet protocols, A2A is very intuitive. You have a server; you have a client. All you need to learn is some specific concepts, like the agent card. (The agent card itself could be used to signal not only my capabilities but how I have been tested. You can even think of other metrics like uptime and success rate.) You need to understand the concept of a task. And then the remote agent will update on this task as defined—for example, every five minutes or [upon] completion of specific subtasks.
29:52: A2A already supports JavaScript, TypeScript, Python, Java, and .NET. In ADK, the agent development kit, with one line of code we can define a new A2A agent.
30:27: What is the current state of adoption?
30:40: I should have looked at the PyPI download numbers.
30:49: Are you aware of teams or companies starting to use A2A?
30:55: I’ve worked with a customer with an insurance platform. I don’t know anything about insurance, but there’s the broker and the underwriter, which are usually two different companies. They were thinking about building an agent for each and having the agents talk via A2A
31:32: Sokratis, what about you?
31:40: The interest is there for sure. Three weeks ago, I presented [at] the Google Cloud London Summit with a big customer on the integration of A2A into their agentic platform, and we shared tens of customers, including the announcement from Microsoft. Many customers start implementing agents. At some point they lack integration across business units. Now they see the more agents they build, the more the need for A2A.
32:32: A2A is now in the Linux Foundation, which makes it more attractive for companies to explore, adopt, and contribute to, because it’s no longer controlled by a single entity. So decision making will be shared across multiple entities.

Source link

Understanding A2A with Heiko Hotz and Sokratis Kartakis – O’Reilly

Timestamps

Foreach, Spark 3.0 and Databricks Connect

Google AI Unveils Supervised Reinforcement Learning (SRL): A Step Wise Framework with Expert Trajectories to Teach Small Language Models to Reason through Hard Problems

Encrypted deep learning with Syft and Keras

Most Popular

Typing Games Are Cool Again Thanks To Wildly Unexpected Twists

Foreach, Spark 3.0 and Databricks Connect

The Running Man Reunites Guardians of the Galaxy Stars

Avatar’s Johannes Eckerstrom Celebrates 10th Full-Length Album

Recent Comments

EDITOR PICKS

Typing Games Are Cool Again Thanks To Wildly Unexpected Twists

Foreach, Spark 3.0 and Databricks Connect

The Running Man Reunites Guardians of the Galaxy Stars

POPULAR POSTS

Typing Games Are Cool Again Thanks To Wildly Unexpected Twists

Foreach, Spark 3.0 and Databricks Connect

The Running Man Reunites Guardians of the Galaxy Stars

POPULAR CATEGORY

ABOUT US

FOLLOW US