CIAO Core Components: Conversations
CIAO (Conversations - Interfaces - Agents - Orchestration) is an architectural framework for designing AI powered agent-based applications. In this article we analyse conversations in more detail.
There is a new breed of applications that are conversation centric (depending mostly on LLMs to achieve this), can have components that make decisions and take actions independently (Agents) and can interact with the user across a variety of modalities (Interfaces). The CIAO (Conversation - Interface - Agents - Orchestration) framework is an attempt to structure our reasoning about these applications in a coherent and consistent way. We introduced CIAO here, and will continue exploring it over a series of articles.
Conversations
While some AI-powered application will operate without conversations (e.g. a machine learning model that identifies defects in a factory line) the future of user-facing AI-applications is undeniably conversational.
Even in a possible post-LLM world (as hard as that is to imagine right now), it will be impossible to take users back to a world where they cannot just talk to their software to tell it what they want.
Similarly, AI Agents will continue to depend on conversation to interact with each other and with humans and receive instructions about the goals they should achieve.
The directness and intuitive nature of simply asking for what we want has become too valuable to abandon.
We've crossed a threshold where technology now adapts to human communication patterns rather than forcing humans to learn machine syntax.
In this context, conversations are a first-order component of any AI Application. Now, when thinking about the architecture of your conversations in your application you have a number of decisions to make.
What is the structure of the conversation? What is the structure of a message? Can every participant (human or AI Agent) can post a message into a conversation at any point? Is there a more strict, protocol-driven, turn-taking approach?
Are conversations divided into smaller sub-conversations based on context or is it all one long exchange?
How is conversation context understood? Is it purely a derived based on the messages exchanged or is context imposed because of business processes that are ongoing?
How can past interactions be retrieved and analysed by the participants of the conversation? Does the conversation management component offer any abstractions or is that left to the participants to manage?
Is the conversation structure independent of the application domain or is it optimised to best suit the needs of the specific domain?
Too often application development simply starts with implicit assumptions (most often led by what an API such as the OpenAI API offers) rather than thinking about what the application actually needs.
Here we will explore the different components of conversations and in future articles we will describe different strategies that existing systems have taken.
Participants
Conversations start with Participants. A Participant is any entity that can contribute to a conversation.
Humans: Humans are typically the users, although we can imagine other roles such as moderators, reviewers, etc.
Agents: Agents as more independented and goal-directed entities can also contribute to a conversation.
Systems: Finally, you may have other systems such as tools, API calls, dedicate orchestrators or other management components that in a less pro-active manner inject information in a conversation.
Messages & Events
Conversations are primarily made up of Messages. Messages will have a sender, the message content and metadata associated with that message (timestamp, id, etc). The message content can be plain text or it can be structured to make it more amenable to representation in various forms.
Depending on the sophistication of our application it may also be useful to consider a conversation containing Events in addition to messages. Events can manage and structure message flow and broader conversation policy and administration needs.
For example an event may be used to indicate that a new Participant has joined the conversation, or that a new knowledge document has been added to the conversation.
An event may be triggered by a workflow tool to indicate a conversation start or end, or an explicit context change or it can be used to inject information from APIs and external systems.
Context & State Representation
Context is a high-level description of what is currently happening in our system. Context can either be implicit, i.e. needs to be derived by interpreting all the different individual messages and events, or it can be explicit, i.e. have explicit label definitions of context, or a hybrid.
Explicit definitions of context can be particularly efficient, especially when considering business process. Implicit context derived through messages is much more flexible but carries with it the risk of misinterpretation.
Another way of managing context and state is to provide specific checkpoints or summaries that are derived from an existing set of messages but provide a more efficient representation as a summary so that a system interacting with the conversation artifact does not need to reason about the entire conversation every time.
Conversation Policies
A Conversation Policy is a description of what can be said at any given point in a conversation. For example, you may adopt a simple turn-taking policy, or you may impose more sophisticated controls where specific conditions needs to be met before someone can participate in a conversation. Conversational policy could also dictate not only who can participate but what can be said (allowed intents) and what can be done (allowed actions). Being able to define policies, especially in more complex business settings or settings where multiple agents and humans need to collaborate and co-ordinate is crucial.
Conclusions
I think this gives us a solid starting set of concepts for how to think about and structure conversations. What I hope is evident is that it is not a simple case of ordering messages and then letting LLMs through prompts reason about them. Much more needs to be considered and careful thought should go into the specific needs and goals of your AI application before you settle on what conversational approach will be right for you.
There is also much that we haven’t addressed. How do multi-modal conversations impact our understanding of a conversation model (if at all)? How do we recover from deviations on conversation policies? Who is responsible for enforcing a policy? How do we trust the statements other participants are saying. We will get to all of this in due course! If you want to follow along please subscribe.