Skip to content

Commit 91dbfef

Browse files
authored
Merge pull request #64 from daily-co/docs
Some docs
2 parents bf3ae09 + 3b61d0b commit 91dbfef

File tree

2 files changed

+57
-1
lines changed

2 files changed

+57
-1
lines changed

docs/architecture.md

+15
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,17 @@
11
# Daily AI SDK Architecture Guide
22

3+
## Frames
4+
5+
Frames can represent discrete chunks of data, for instance a chunk of text, a chunk of audio, or an image. They can also be used to as control flow, for instance a frame that indicates that there is no more data available, or that a user started or stopped talking. They can also represent more complex data structures, such as a message array used for an LLM completion.
6+
7+
## FrameProcessors
8+
9+
Frame processors operate on frames. Every frame processor implements a `process_frame` method that consumes one frame and produces zero or more frames. Frame processors can do simple transforms, such as concatenating text fragments into sentences, or they can treat frames as input for an AI Service, and emit chat completions based on message arrays or transform text into audio or images.
10+
11+
## Pipelines
12+
13+
Pipelines are lists of frame processors that read from a source queue and send the processed frames to a sink queue. A very simple pipeline might chain an LLM frame processor to a text-to-speech frame processor, with a transport's send queue as its sync. Placing LLM message frames on the pipeline's source queue will cause the LLM's response to be spoken. See example #2 for an implementation of this.
14+
15+
## Transports
16+
17+
Transports provide a receive queue, which is input from "the outside world", and a sink queue, which is data that will be sent "to the outside world". The `LocalTransportService` does this with the local camera, mic, display and speaker. The `DailyTransportService` does this with a WebRTC session joined to a Daily.co room.

src/dailyai/pipeline/frames.py

+42-1
Original file line numberDiff line numberDiff line change
@@ -17,14 +17,23 @@ def __eq__(self, other):
1717

1818

1919
class StartFrame(ControlFrame):
20+
"""Used (but not required) to start a pipeline, and is also used to
21+
indicate that an interruption has ended and the transport should start
22+
processing frames again."""
2023
pass
2124

2225

2326
class EndFrame(ControlFrame):
27+
"""Indicates that a pipeline has ended and frame processors and pipelines
28+
should be shut down. If the transport receives this frame, it will stop
29+
sending frames to its output channel(s) and close all its threads."""
2430
pass
2531

2632

2733
class EndPipeFrame(ControlFrame):
34+
"""Indicates that a pipeline has ended but that the transport should
35+
continue processing. This frame is used in parallel pipelines and other
36+
sub-pipelines."""
2837
pass
2938

3039

@@ -39,15 +48,20 @@ class PipelineStartedFrame(ControlFrame):
3948

4049

4150
class LLMResponseStartFrame(ControlFrame):
51+
"""Used to indicate the beginning of an LLM response. Following TextFrames
52+
are part of the LLM response until an LLMResponseEndFrame"""
4253
pass
4354

4455

4556
class LLMResponseEndFrame(ControlFrame):
57+
"""Indicates the end of an LLM response."""
4658
pass
4759

4860

4961
@dataclass()
5062
class AudioFrame(Frame):
63+
"""A chunk of audio. Will be played by the transport if the transport's mic
64+
has been enabled."""
5165
data: bytes
5266

5367
def __str__(self):
@@ -56,6 +70,8 @@ def __str__(self):
5670

5771
@dataclass()
5872
class ImageFrame(Frame):
73+
"""An image. Will be shown by the transport if the transport's camera is
74+
enabled."""
5975
url: str | None
6076
image: bytes
6177

@@ -65,14 +81,19 @@ def __str__(self):
6581

6682
@dataclass()
6783
class SpriteFrame(Frame):
84+
"""An animated sprite. Will be shown by the transport if the transport's
85+
camera is enabled. Will play at the framerate specified in the transport's
86+
`fps` constructor parameter."""
6887
images: list[bytes]
6988

7089
def __str__(self):
71-
return f"{self.__class__.name__}, list size: {len(self.images)}"
90+
return f"{self.__class__.__name__}, list size: {len(self.images)}"
7291

7392

7493
@dataclass()
7594
class TextFrame(Frame):
95+
"""A chunk of text. Emitted by LLM services, consumed by TTS services, can
96+
be used to send text through pipelines."""
7697
text: str
7798

7899
def __str__(self):
@@ -81,17 +102,27 @@ def __str__(self):
81102

82103
@dataclass()
83104
class TranscriptionQueueFrame(TextFrame):
105+
"""A text frame with transcription-specific data. Will be placed in the
106+
transport's receive queue when a participant speaks."""
84107
participantId: str
85108
timestamp: str
86109

87110

88111
@dataclass()
89112
class LLMMessagesQueueFrame(Frame):
113+
"""A frame containing a list of LLM messages. Used to signal that an LLM
114+
service should run a chat completion and emit an LLMStartFrames, TextFrames
115+
and an LLMEndFrame.
116+
Note that the messages property on this class is mutable, and will be
117+
be updated by various ResponseAggregator frame processors."""
90118
messages: List[dict]
91119

92120

93121
@dataclass()
94122
class OpenAILLMContextFrame(Frame):
123+
"""Like an LLMMessagesQueueFrame, but with extra context specific to the
124+
OpenAI API. The context in this message is also mutable, and will be
125+
changed by the OpenAIContextAggregator frame processor."""
95126
context: OpenAILLMContext
96127

97128

@@ -114,10 +145,15 @@ def __str__(self):
114145

115146

116147
class UserStartedSpeakingFrame(Frame):
148+
"""Emitted by VAD to indicate that a participant has started speaking.
149+
This can be used for interruptions or other times when detecting that
150+
someone is speaking is more important than knowing what they're saying
151+
(as you will with a TranscriptionFrame)"""
117152
pass
118153

119154

120155
class UserStoppedSpeakingFrame(Frame):
156+
"""Emitted by the VAD to indicate that a user stopped speaking."""
121157
pass
122158

123159

@@ -131,10 +167,15 @@ class BotStoppedSpeakingFrame(Frame):
131167

132168
@dataclass()
133169
class LLMFunctionStartFrame(Frame):
170+
"""Emitted when the LLM receives the beginning of a function call
171+
completion. A frame processor can use this frame to indicate that it should
172+
start preparing to make a function call, if it can do so in the absence of
173+
any arguments."""
134174
function_name: str
135175

136176

137177
@dataclass()
138178
class LLMFunctionCallFrame(Frame):
179+
"""Emitted when the LLM has received an entire function call completion."""
139180
function_name: str
140181
arguments: str

0 commit comments

Comments
 (0)