Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blog post for OpenTelemetry Generative AI updates #5575

Open
wants to merge 95 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
95 commits
Select commit Hold shift + click to select a range
5c062d9
Initial draft
drewby Oct 25, 2024
e4735c4
Merge branch 'open-telemetry:main' into drewby/genai_blog
drewby Nov 9, 2024
5c9a04b
Add screenshots
drewby Nov 9, 2024
c3d7c9e
Update link for Python repo
drewby Nov 9, 2024
db18921
fix spelling error
drewby Nov 9, 2024
67f2bc9
Add link to Aspire Dashboard
drewby Nov 9, 2024
6a51e0c
Add link to Jaeger
drewby Nov 9, 2024
21b9c42
Updates from PR review
drewby Nov 9, 2024
2e9eab2
Rename files
drewby Nov 9, 2024
a682d22
Pin version
drewby Nov 9, 2024
a437c07
Update linting/spelling error
drewby Nov 9, 2024
9984bb5
Fix format error
drewby Nov 9, 2024
c02d7c1
Updates to library intro and metric section
drewby Nov 9, 2024
7481b81
Update introduction
drewby Nov 10, 2024
ee503df
Use headers instead of bold
drewby Nov 10, 2024
5fd96c3
Fix formatting
drewby Nov 10, 2024
ebbfab4
Link to docs page
drewby Nov 11, 2024
cf035e0
Add links to spec and python projects
drewby Nov 11, 2024
a6e4b96
Colon instead of period
drewby Nov 11, 2024
41aa9dc
Add issue and sig
drewby Nov 11, 2024
0189c62
Move text to flow better in outline
drewby Nov 11, 2024
86e6c4d
Clarify library focus
drewby Nov 11, 2024
fceb35f
Add comment about using Events
drewby Nov 11, 2024
30b6e38
Change Spans to Traces
drewby Nov 11, 2024
a0f3250
Specifics about the first Instrumentation Library
drewby Nov 12, 2024
377b27a
Use alert shortcode
drewby Nov 12, 2024
d96b423
Add link to instrumentation library
drewby Nov 13, 2024
076a6dc
Add link to "submit a PR" to ecosystem pages (#5571)
chalin Nov 10, 2024
64c919d
Fix: fix dice number generator in JS examples (#5565)
oscar60310 Nov 10, 2024
00aea1f
Change Embrace Android distro component to Android (#5530)
davidlawrencer Nov 10, 2024
366f6fa
Update opentelemetry-java version to v1.44.1 (#5577)
opentelemetrybot Nov 10, 2024
2143b7c
Add new page feedback issue template (#5548)
svrnm Nov 11, 2024
387abd2
Update java docs for 1.44.0 release (#5566)
jack-berg Nov 11, 2024
0ad8134
add a "flag" filter to registry (#5328)
taylorhelene Nov 11, 2024
b176b96
add tags to registry entries (#5382)
olamideTiana Nov 11, 2024
be02c5d
Add quick installation for maven to the registry (#5330)
mercybassey Nov 11, 2024
3c79c00
Add missing heading ids on lang:ja (#5584)
katzchang Nov 11, 2024
78cece9
[CI] adjust-pages: report obsolete patch-code as INFO msgs (#5587)
chalin Nov 11, 2024
19203c5
update pull request template (#5545)
svrnm Nov 11, 2024
5adbe82
[es] restore observability primer (#5589)
svrnm Nov 11, 2024
004375e
Remove unused import in Python prometheus doc (#5585)
mimikwang Nov 11, 2024
c3dc83c
[pt] Translate multiple pages on /pt/docs/languages/go (#5426)
vitorvasc Nov 12, 2024
b68e5f0
Revise k8s operator docs for Python after latest releases (#5583)
xrmx Nov 12, 2024
045a384
Add missing en anchors for ES translations (#5580)
theletterf Nov 12, 2024
480e666
docs: Add HTTP port to collector docker command (#5441)
kaylareopelle Nov 13, 2024
5b93bb2
Add missing en anchors for PT-BR translations (#5594)
emdneto Nov 13, 2024
1ce7435
Auto-update registry versions (1c01fda8cc057d3ae5bd6aca2a24b920ed9382…
opentelemetrybot Nov 13, 2024
0e09925
Update opentelemetry-java-instrumentation version to v2.10.0 (#5598)
opentelemetrybot Nov 13, 2024
dcf3563
Update registry schema to allow any https URL for authors (#5605)
svrnm Nov 13, 2024
5e89372
[pt] Temporarily patch page to avoid link-check failure (#5603)
emdneto Nov 13, 2024
1de7735
Registry: add missing URL for Traefik (#5602)
chalin Nov 13, 2024
ed549af
Fix lint/link errors
drewby Nov 14, 2024
8d893e3
Update code sample
drewby Nov 14, 2024
9335129
fix format
drewby Nov 14, 2024
035a0ac
Update code sample
drewby Nov 14, 2024
fb43d44
Updating the python example
drewby Nov 19, 2024
3618743
Update jaeger url
drewby Nov 19, 2024
9c81402
Add example of running with auto-instrument.
drewby Nov 19, 2024
fdd0e24
Updates from PR review
drewby Nov 21, 2024
f433d6e
Registry data cleanup: add urls to all authors (#5608)
svrnm Nov 15, 2024
8e2f28a
[CI] Ensure that htmltest-config warnings fail GH check (#5612)
chalin Nov 15, 2024
01b1940
Move performance to java agent, merge javadoc into API page (#5590)
jack-berg Nov 15, 2024
b0f7a2b
fixed installation command (#5614)
cglucks Nov 15, 2024
58d5b7f
Blogpost to annoucing 2024 OTel Community Awards winners (#5613)
danielgblanco Nov 16, 2024
6dc73c5
[ja] Temporarily patch page to avoid link-check failure (#5609)
katzchang Nov 16, 2024
46cadd1
Added OddDotNet registry files (Attempt 2) (#5617)
DoubleTK Nov 18, 2024
b5842e9
Add maven to registry auto update workflow (#5586)
svrnm Nov 18, 2024
562ea76
Auto-update registry versions (d41be13e5aa782b005c7e9617441cfff363917…
opentelemetrybot Nov 18, 2024
0adbf1e
[CI] Detect unpinned git submodules in GH PR checks (#5627)
chalin Nov 18, 2024
8a3c706
NPM packages refresh (#5628)
chalin Nov 18, 2024
36981ac
Auto-update registry versions (e44886bbd0f9f00489124008e11f85ae080b48…
opentelemetrybot Nov 19, 2024
a2c6be4
Registry Updates Novemeber 2024 (#5625)
svrnm Nov 19, 2024
671ca18
Sort and update community members page (#5619)
svrnm Nov 19, 2024
efe1747
Disable code-excerpting from Go getting-started (#5630)
chalin Nov 19, 2024
610975b
Java documentation for cardinality limits (#5610)
jack-berg Nov 19, 2024
cef3b82
Add Logback MDC to the starter (#5636)
jeanbisutti Nov 19, 2024
b6603c5
Update opentelemetry-collector-releases version to v0.114.0 (#5638)
opentelemetrybot Nov 19, 2024
9f4865f
Fix title for Rust stdout exporter (#5633)
AaronRM Nov 20, 2024
b2f9a3c
Updated summary and table of internal telemetry (#5567)
jade-guiton-dd Nov 20, 2024
2d2accc
Update to Hugo 0.139.0 (#5642)
chalin Nov 20, 2024
f8a5dc9
[CI] Ensure that all `/fix` commands trigger a script (#5648)
chalin Nov 20, 2024
c65e4d9
Enable new page feedback template in hugo (#5582)
svrnm Nov 21, 2024
711a6ff
Update content/en/blog/2024/otel-generative-ai/index.md
drewby Nov 21, 2024
c8bc856
Updates from PR feedback
drewby Nov 21, 2024
bb61aff
fix format
drewby Nov 21, 2024
c6bc6d7
Merge branch 'open-telemetry:main' into drewby/genai_blog
drewby Nov 21, 2024
b232156
Add contributors
drewby Nov 21, 2024
6d68fa9
Merge branch 'open-telemetry:main' into drewby/genai_blog
drewby Nov 22, 2024
510ed57
Shorten linkTitle
drewby Nov 22, 2024
b357cb7
Fix issue link
drewby Nov 22, 2024
bc07e57
Add example to export to console
drewby Nov 22, 2024
2229453
Update alert for Events usage
drewby Nov 22, 2024
dd2f421
Update export to console
drewby Nov 22, 2024
7a9c721
Results from /fix:refcache
opentelemetrybot Nov 23, 2024
4b9daa4
Merge branch 'open-telemetry:main' into drewby/genai_blog
drewby Nov 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
209 changes: 209 additions & 0 deletions content/en/blog/2024/otel-generative-ai/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,209 @@
---
title: OpenTelemetry for Generative AI
linkTitle: OTel for GenAI
date: 2024-11-09
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Putting this comment here to keep an eye on setting the date right when we finally publish. Do not resolve.

Suggested change
date: 2024-11-09
date: 2024-11-09

author: >-
[Drew Robbins](https://github.com/drewby) (Microsoft), [Liudmila
Molkova](https://github.com/lmolkova) (Microsoft)
issue: https://github.com/open-telemetry/opentelemetry.io/issues/5581
sig: SIG GenAI Observability
cSpell:ignore: genai liudmila molkova
---
drewby marked this conversation as resolved.
Show resolved Hide resolved

As organizations increasingly adopt Large Language Models (LLMs) and other
generative AI technologies, ensuring reliable performance, efficiency, and
safety is essential to meet user expectations, optimize resource costs, and
safeguard against unintended outputs. Effective observability for AI operations,
behaviors, and outcomes can help meet these goals. OpenTelemetry is being
enhanced to support these needs specifically for generative AI.

Two primary assets are in development to make this possible: **Semantic
Conventions** and **Instrumentation Libraries**. The first instrumentation
library targets the
[OpenAI Python API library](https://pypi.org/project/openai/).

[**Semantic Conventions**](/docs/concepts/semantic-conventions/) establish
standardized guidelines for how telemetry data is structured and collected
across platforms, defining inputs, outputs, and operational details. For
generative AI, these conventions streamline monitoring, troubleshooting, and
optimizing AI models by standardizing attributes such as model parameters,
response metadata, and token usage. This consistency supports better
observability across tools, environments, and APIs, helping organizations track
performance, cost, and safety with ease.

The
[**Instrumentation Library**](/docs/specs/otel/overview/#instrumentation-libraries)
is being developed within the
[OpenTelemetry Python Contrib](https://github.com/open-telemetry/opentelemetry-python-contrib)
drewby marked this conversation as resolved.
Show resolved Hide resolved
under
[instrumentation-genai](https://github.com/open-telemetry/opentelemetry-python-contrib/tree/main/instrumentation-genai)
project to automate telemetry collection for generative AI applications. The
first release is a Python library for instrumenting OpenAI client calls. This
library captures spans and events, gathering essential data like model inputs,
response metadata, and token usage in a structured format.

drewby marked this conversation as resolved.
Show resolved Hide resolved
## Key Signals for Generative AI

The [Semantic Conventions for Generative AI](/docs/specs/semconv/gen-ai/) focus
on capturing insights into AI model behavior through three primary signals:
[Traces](/docs/concepts/signals/traces/),
[Metrics](/docs/concepts/signals/metrics/), and
[Events](/docs/specs/otel/logs/event-api/).

Together, these signals provide a comprehensive monitoring framework, enabling
better cost management, performance tuning, and request tracing.

### Traces: Tracing Model Interactions

Traces track each model interaction's lifecycle, covering input parameters (for
example, temperature, top_p) and response details like token count or errors.
They provide visibility into each request, aiding in identifying bottlenecks and
analyzing the impact of settings on model output.

### Metrics: Monitoring Usage and Performance

Metrics aggregate high-level indicators like request volume, latency, and token
counts, essential for managing costs and performance. This data is particularly
critical for API-dependent AI applications with rate limits and cost
considerations.

### Events: Capturing Detailed Interactions

Events log detailed moments during model execution, such as user prompts and
model responses, providing a granular view of model interactions. These insights
are invaluable for debugging and optimizing AI applications where unexpected
behaviors may arise.
drewby marked this conversation as resolved.
Show resolved Hide resolved

{{% alert title="Note" color="info" %}} Note that we decided to use
[events emitted](/docs/specs/otel/logs/api/#emit-an-event) with the
[Logs API](/docs/specs/otel/logs/api/) specification in the Semantic Conventions
for Generative AI. Events allows for us to define specific
[semantic conventions](/docs/specs/semconv/general/events/) for the user prompts
and model responses that we capture. This addition to the API is in development
and considered unstable.{{% /alert %}}

### Extending Observability with Vendor-Specific Attributes

The Semantic Conventions also define vendor-specific attributes for platforms
like OpenAI and Azure Inference API, ensuring telemetry captures both general
and provider-specific details. This added flexibility supports multi-platform
monitoring and in-depth insights.

## Building the Python Instrumentation Library for OpenAI

This Python-based library for OpenTelemetry captures key telemetry signals for
OpenAI models, providing developers with an out-of-the-box observability
solution tailored to AI workloads. The library,
[hosted within the OpenTelemetry Python Contrib repository](https://github.com/open-telemetry/opentelemetry-python-contrib/tree/opentelemetry-instrumentation-openai-v2%3D%3D2.0b0/instrumentation-genai/opentelemetry-instrumentation-openai-v2),
automatically collects telemetry from OpenAI model interactions, including
request and response metadata and token usage.

As generative AI applications grow, additional instrumentation libraries for
other languages will follow, extending OpenTelemetry support across more tools
and environments. The current library's focus on OpenAI highlights its
popularity and demand within AI development, making it a valuable initial
implementation.

### Example Usage
drewby marked this conversation as resolved.
Show resolved Hide resolved

Here's an example of using the OpenTelemetry Python library to monitor a
generative AI application with the OpenAI client.

Install the OpenTelemetry dependencies:

```shell
pip install opentelemetry-distro
opentelemetry-bootstrap -a install
```

Set the following environment variables, updating the endpoint and protocol as
appropriate:

```shell
OPENAI_API_KEY=<replace_with_your_openai_api_key>

OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4318
drewby marked this conversation as resolved.
Show resolved Hide resolved
OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
OTEL_SERVICE_NAME=python-opentelemetry-openai
OTEL_LOGS_EXPORTER=otlp_proto_http
OTEL_PYTHON_LOGGING_AUTO_INSTRUMENTATION_ENABLED=true
# Set to false or remove to disable log events
OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true
drewby marked this conversation as resolved.
Show resolved Hide resolved
```

Then include the following code in your Python application:
drewby marked this conversation as resolved.
Show resolved Hide resolved

```python
import os
from openai import OpenAI

drewby marked this conversation as resolved.
Show resolved Hide resolved
client = OpenAI()
chat_completion = client.chat.completions.create(
model=os.getenv("CHAT_MODEL", "gpt-4o-mini"),
messages=[
{
"role": "user",
"content": "Write a short poem on OpenTelemetry.",
},
],
)
print(chat_completion.choices[0].message.content)
```
drewby marked this conversation as resolved.
Show resolved Hide resolved

And then run the example using `opentelemetry-instrument`:
drewby marked this conversation as resolved.
Show resolved Hide resolved

```shell
opentelemetry-instrument python main.py
drewby marked this conversation as resolved.
Show resolved Hide resolved
```

If you do not have a service running to collect telemetry, you can export to the
console using the following:

```shell
opentelemetry-instrument --traces_exporter console --metrics_exporter console python main.py
```

There is a complete example
[available here](https://github.com/open-telemetry/opentelemetry-python-contrib/tree/main/instrumentation-genai/opentelemetry-instrumentation-openai-v2/example).

With this simple instrumentation, one can begin capture traces from their
generative AI application. Here is an example from the
[Aspire Dashboard](https://learn.microsoft.com/dotnet/aspire/fundamentals/dashboard/standalone?tabs=bash)
for local debugging.

![Chat trace in Aspire Dashboard](aspire-dashboard-trace.png)
drewby marked this conversation as resolved.
Show resolved Hide resolved

Here is a similar trace captured in
[Jaeger](https://www.jaegertracing.io/docs/1.63/getting-started/#all-in-one):

![Chat trace in Jaeger](jaeger-trace.png)

It's also easy to capture the content history of the chat for debugging and
improving your application. Simply set the environment variable
`OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT` as follows:

```shell
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=True
```

This will turn on content capture which collects OpenTelemetry events containing
the payload:

![Content Capture Aspire Dashboard](aspire-dashboard-content-capture.png)

## Join Us in Shaping the Future of Generative AI Observability

Community collaboration is key to OpenTelemetry's success. We invite developers,
AI practitioners, and organizations to contribute, share feedback, or
participate in discussions. Explore the OpenTelemetry Python Contrib project,
contribute code, or help shape observability for AI as it continues to evolve.

We now have contributors from [Amazon](https://aws.amazon.com/),
[Elastic](https://www.elastic.co/), [Google](https://www.google.com/),
[IBM](https://www.ibm.com), [Langtrace](https://www.langtrace.ai/),
[Microsoft](https://www.microsoft.com/), [OpenLIT](https://openlit.io/),
[Scorecard](https://www.scorecard.io/), [Traceloop](https://www.traceloop.com/),
and more!

You are welcome to join the community! More information can be found at the
[Generative AI Observability project page](https://github.com/open-telemetry/community/blob/main/projects/gen-ai.md).
drewby marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
36 changes: 36 additions & 0 deletions static/refcache.json
Original file line number Diff line number Diff line change
Expand Up @@ -9351,6 +9351,10 @@
"StatusCode": 200,
"LastSeen": "2024-10-09T10:20:06.931205+02:00"
},
"https://learn.microsoft.com/dotnet/aspire/fundamentals/dashboard/standalone": {
"StatusCode": 200,
"LastSeen": "2024-11-23T12:14:07.853659267Z"
},
"https://learn.microsoft.com/en-us/azure/azure-monitor/app/opentelemetry-enable": {
"StatusCode": 200,
"LastSeen": "2024-04-23T14:33:24.635286085Z"
Expand Down Expand Up @@ -10155,6 +10159,10 @@
"StatusCode": 206,
"LastSeen": "2024-01-18T19:08:05.648675-05:00"
},
"https://openlit.io/": {
"StatusCode": 200,
"LastSeen": "2024-11-23T12:14:26.1112284Z"
},
"https://openmetrics.io/": {
"StatusCode": 206,
"LastSeen": "2024-01-18T19:07:18.197228-05:00"
Expand Down Expand Up @@ -11935,6 +11943,10 @@
"StatusCode": 206,
"LastSeen": "2024-08-09T11:02:26.926617-04:00"
},
"https://pypi.org/project/openai/": {
"StatusCode": 206,
"LastSeen": "2024-11-23T12:14:04.891807488Z"
},
"https://pypi.org/project/opentelemetry-api/": {
"StatusCode": 206,
"LastSeen": "2024-01-30T06:01:19.327156-05:00"
Expand Down Expand Up @@ -13423,6 +13435,10 @@
"StatusCode": 206,
"LastSeen": "2024-08-09T10:46:30.160571-04:00"
},
"https://www.google.com/": {
"StatusCode": 200,
"LastSeen": "2024-11-23T12:14:14.294654205Z"
},
"https://www.graalvm.org/latest/reference-manual/native-image/": {
"StatusCode": 206,
"LastSeen": "2024-09-30T11:46:04.441837921+02:00"
Expand Down Expand Up @@ -13471,6 +13487,10 @@
"StatusCode": 200,
"LastSeen": "2024-01-30T16:15:04.543149-05:00"
},
"https://www.ibm.com": {
"StatusCode": 206,
"LastSeen": "2024-11-23T12:14:17.04667319Z"
},
"https://www.ibm.com/docs/api/v1/content/SSYKE2_8.0.0/openj9/api/jdk8/jre/management/extension/com/ibm/lang/management/OperatingSystemMXBean.html": {
"StatusCode": 206,
"LastSeen": "2024-08-09T10:46:28.705852-04:00"
Expand Down Expand Up @@ -13583,6 +13603,10 @@
"StatusCode": 206,
"LastSeen": "2024-08-09T09:42:46.824519+02:00"
},
"https://www.jaegertracing.io/docs/1.63/getting-started/#all-in-one": {
"StatusCode": 206,
"LastSeen": "2024-11-23T12:14:12.418512408Z"
},
"https://www.jaegertracing.io/docs/latest/apis/": {
"StatusCode": 206,
"LastSeen": "2024-01-18T19:37:16.697232-05:00"
Expand Down Expand Up @@ -13875,6 +13899,10 @@
"StatusCode": 206,
"LastSeen": "2024-04-19T07:13:43.941227206Z"
},
"https://www.langtrace.ai/": {
"StatusCode": 200,
"LastSeen": "2024-11-23T12:14:21.39130864Z"
},
"https://www.linuxfoundation.org/legal/privacy-policy": {
"StatusCode": 200,
"LastSeen": "2024-01-30T16:04:05.250977-05:00"
Expand Down Expand Up @@ -14475,6 +14503,10 @@
"StatusCode": 206,
"LastSeen": "2024-01-30T15:25:04.905602-05:00"
},
"https://www.scorecard.io/": {
"StatusCode": 200,
"LastSeen": "2024-11-23T12:14:29.262999554Z"
},
"https://www.selenium.dev/documentation/grid/advanced_features/observability/": {
"StatusCode": 206,
"LastSeen": "2024-01-30T16:05:03.991313-05:00"
Expand Down Expand Up @@ -14563,6 +14595,10 @@
"StatusCode": 206,
"LastSeen": "2024-01-30T05:18:08.486678-05:00"
},
"https://www.traceloop.com/": {
"StatusCode": 200,
"LastSeen": "2024-11-23T12:14:34.919732662Z"
},
"https://www.typescriptlang.org/download": {
"StatusCode": 206,
"LastSeen": "2024-01-18T19:10:44.997912-05:00"
Expand Down