We are creating a new type of software application. With LLMs as the catalyst technology, we are building applications that are conversation-centric with increasing levels of automated decision-making and the ability to handle a variety of input and output types. New applications types call for new architectural frameworks that better capture the core concepts.
A clear framework enables us to separate concerns, compare and constrast alternative approaches, describe common solutions as reusable patterns and in doing so create more robust and more reliable applications.
There are several examples of such frameworks and how theyโve helped build better applications. In the web application development world MVC (Model - View - Controller) helps teams think about the separation of data, business logic and graphical representation. Faced with new cloud-based scaling capabilities the Microservices architecture helps teams to reason about how to deploy independent services that interface over standardised APIs. Speaking of APIs, the REST (REpresentational State Transfer) style gave teams a way to think about how to structure and approach their API development.
Frameworks are not just for software developers. From a design or user experience perspective we have similar needs. Whether it is something high level like atomic design, or user journey maps or something more specific such as progressive disclosure or designing information architecture based on card sorting these frameworks are invaluable tools. They allow us to share a common understanding, a common language and reduce the cognitive load required to reason about complex problems.
Creating such as framework for AI-powered applications is something that has been on my mind for some time now. Iโve been toying around with what the main concepts should be, how they relate to each other and writing small pieces of code to prove out ideas as I was going along. From today Iโll continue that process but share the ideas and their evolution in the open on agentsdecoded.com. If you want to keep updated subscribe.
CIAO๐ : Conversations, Interfaces, Agents, Orchestration
CIAO provides a unifying architecture framework to address the challenges of building applications that deal with ongoing context-aware interactions, multimodal communication, automated decision-making and dynamic co-ordination.
A key design principle of CIAO is to focus first on the what of AI applications, the fundamental capabilities and interactions required, rather that the how, the specific technologies and implementations used to realise them.
For example, the architectural need to understand natural language (a "what") exists independently of whether an LLM, semantic parser, or hybrid approach (all "how" concerns) is employed. Similarly, the need for an agent to reason about and execute actions independently (a "what") can be implemented through various technologies and methodologies (the "how").
This separation of concerns allows us to:
Focus on capabilities and interactions before technology selection.
Adapt to changes in technological capability without architectural overhaul.
Make implementation choices based on specific requirements and constraints.
In short, it is about freeing ourselves as designers and builders from what is โtrendyโ or โin fashionโ and focussing on what is required and is the right solution for the task at hand.
In a world where LLM-powered software development is making the actual writing of code increasingly โcheaperโ the most important question you need to ask yourself is what you need to build.
The CIAO High-level Architecture
There is still work to do to determine what a high-level diagram capturing the interactions between CIAO components should look like but here is a starting point.
Human Participants
Human Participants represents the users who interact with the application. They bring their own goals, knowledge, and contexts to interactions, and the system must model those internally and adapt to their needs and preferences.
Conversations
Conversations manage the ongoing exchanges of information and intent between participants - both human and software (agents) participants. Conversation components need to maintain context and track dialogue state among multiple participants. They are the common shared artifacts created through the interactions of participants (human and software agents). We can imagine different conversation components to describe different conversation styles from completely open-ended conversation to highly structured protocols witch just a small set of possible speech acts.
Interfaces
Interfaces provide the interaction channels between human participants and the system. They handle different interaction modalities (text, voice, visual, etc.) and are responsible for capturing inputs and rendering outputs in human-understandable forms. The Interface layer manages presentation concerns, accessibility, device adaptation, and creates appropriate representations of system state and activities for human consumption.
Agents
Agents are components with their own automated decision-making capabilities (I am striving hard to steer away from using the word autonomy). Agents will have some implicit or explicit representation of goals, may possess domain knowledge, reasoning capabilities, and the ability to execute actions. They respond to requests coordinated through Orchestration, perform tasks, access external systems and share outputs and interact with other participants in the system (humans or not) through conversations.
Orchestration
Orchestration serves as the coordinator of the system, connecting all other components. It manages system-wide state, controls workflows, allocates resources, handles errors, and ensures components work together coherently. Orchestration may maintain the overall application logic and enforce policies. You may have a very thin orchestration layer or a very sophisticated one depending on the needs of your system. We could even envision multiple orchestrator components depending on the specific type of application with more elaborate topologies of participants, interfaces and conversations.
Next Steps
With an initial set of concepts in place the next article will deal with two different challenges. First we are doing to provide more detail about the core concerns of each component (e.g. dialogue management for conversations or how we think about goals in agents), secondly we are going to start using the framework to describe specific use cases and start relating that to how they might be implemented in existing tools. The latter will start indicating where we may have space for improvement or are missing.
If you are curious about how this will evolve sign-up to get notified. Iโll aim to post one or two updates a week.