CIAO👋 Core Components: Interfaces
In this article we discuss Interfaces and explore using the Conversational Turn as the constant across any type of Interface - forming the contract between Conversation components and Interfaces.
CIAO👋 is a unifying architectural framework for AI-powered applications that organizes systems into four core components: Conversations (managing dialogue state), Interfaces (handling human interaction), Agents (automated decision-makers), and Orchestration (coordinating everything).
The framework prioritizes "what" applications need to do (like understanding natural language or reasoning about actions) over "how" they're implemented (specific technologies like LLMs), allowing systems to adapt as technologies evolve.
It's designed for the new generation of software that is conversation-centric, handles multiple input/output types, and features increasing levels of automated decision-making.
Check out the introduction, our deep dive into Conversations or read on here for Interfaces.
LLMs as a technology may eventually be superseded. It is entirely possible that we will come up with new ways to “understand” language and simulate reasoning. What, however, is unlikely to go away, as it has survived millennia of human interaction, are conversations as a central part of digital interfaces and the main way we interact with machines. In that aspect, the flip has switched, the expectation is set, there is no going back. Once someone has experienced the ease with which they can just ask their question in ChatGPT or Claude how can you remove that capability
As such, when we think of our Interfaces within the context of CIAO 👋, the starting point is undoubtedly the conversational interface.
It starts with text, but quickly evolves
Now, if the basic interaction mode is one where the user asks a question, things have quickly moved on. Users can now interact across different modalities and in ways that are not always even that clear.
We can type out the question, speak it, provide attachments by uploading files and if we break out of the screen we can think of gestures and other types of ambient signals (location, orientation, expression and so on) - all of which contribute to sending a message to the software and leaving it with the job of interpreting intent.
Similarly, the response from the interface can start with a text response but can quickly evolve from there to include a variety of signals and artifacts that encapsulate, on one form or another, the answer to our query. Recently, Anthropic has doubled down on the idea of the artifact - an output of the main conversation that sits apart from the main conversation and can move to front-stage or take a back-seat depending on context. An artifact could be a longer piece of text that will be iterated on, a piece of code or even a full-blown application.
Ultimately what we have is a very fluid environment, that will adapt depending on context. When we are architecting applications with this very fluid concept of interface the challenge becomes identifying what we can hang on to. What remains constant irrespective of the specific interface and we can use an abstraction within our architecture to represent any type of interface?
The answer lies not in the interface itself, but in what flows through it: the conversational turn.
The Universal Pattern
Think about it - whether you're typing a question, speaking to your device, uploading a file, or even making a gesture, you're fundamentally doing the same thing: taking a turn in a conversation. This pattern has survived millennia of human interaction because it represents something fundamental about how we exchange meaning.
In the context of CIAO👋, this abstraction sits at the intersection of Conversations and Interfaces, serving as the bridge between them. It's the atomic unit of interaction that remains consistent regardless of how it's expressed or received.
Anatomy of a Conversational Turn
What makes up this abstraction? At its core, every turn in a conversation - regardless of interface - contains:
Content: The multimodal payload itself. This could be text, audio, visual data, files, or even ambient signals like location or device orientation. The abstraction doesn't care about the specific format - it just needs to carry the content.
Intent: The interpreted purpose behind the interaction. Whether spoken, typed, or gestured, there's always an underlying intent that needs to be captured and understood.
Context Reference: Every turn exists within a conversational flow. It references what came before and influences what comes after. This context travels with each exchange, maintaining coherence across modality switches.
Metadata: The supporting information that helps interpret the turn - timestamp, participant ID, modality type, device context, and other environmental factors that might influence interpretation.
Artifacts: As Anthropic has demonstrated, responses increasingly include outputs that transcend simple replies - code, documents, applications. These artifacts are part of the turn but have their own lifecycle and interaction patterns.
Why This Abstraction Matters
This approach aligns perfectly with CIAO's principle of separating "what" from "how". The "what" is the fundamental need to exchange meaningful information between participants. The "how" is the specific interface implementation.
Consider a practical example: You start by typing a question about data analysis. The system responds with text and a visualization artifact. You then speak a follow-up question while pointing at a specific part of the chart on your screen. Finally, you upload a CSV file with additional data.
Each of these interactions uses a different interface modality, but they're all turns in the same conversation. The abstraction allows the system to:
Process each turn uniformly, regardless of input method
Maintain conversation coherence as you switch between interfaces
Enable agents to focus on intent and content rather than interface specifics
Allow orchestration to manage the flow without being coupled to interface details
Implementation Considerations
In practice, this abstraction becomes a contract between Interfaces and Conversations. Interfaces are responsible for:
Capturing raw input in whatever form it takes
Packaging it into a standardized turn structure
Enriching it with relevant metadata
Passing it to the Conversation component
The Conversation component then:
Maintains the dialogue state
Routes turns to appropriate Agents via Orchestration
Manages the overall flow and context
Returns responses that Interfaces can render appropriately
This separation means we can add new interface types without changing the core conversation logic. Want to add a brain-computer interface? As long as it can package thoughts into turns, the rest of the system doesn't need to know or care.
The Evolutionary Path
This abstraction also gives us a framework for thinking about interface evolution. Early systems might support simple text turns. As they mature, they add:
Multimodal input processing
Artifact generation and manipulation
Ambient context awareness
Collaborative features with multiple participants
Each evolution extends the turn abstraction rather than replacing it. The conversation remains the constant, even as interfaces become more sophisticated.
For Builders
When designing AI-powered applications using CIAO👋, start with the turn abstraction. Ask yourself:
What constitutes a turn in my application's context?
What metadata do I need to capture for proper interpretation?
How will artifacts flow through the conversation?
What happens when users switch interfaces mid-conversation?
By grounding your architecture in this abstraction, you create systems that can adapt to new interfaces while maintaining consistency and coherence. The specific technologies will evolve - LLMs may be superseded, new interaction paradigms will emerge - but the conversational turn will remain.
After all, as we said at the start, conversations have survived millennia of human interaction. By making the turn our architectural constant, we align our software with this fundamental human pattern. The interface may be fluid, but the conversation endures.