Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diff between two versions of a semconv registry - Proposal #432

Open
lquerel opened this issue Oct 28, 2024 · 0 comments
Open

Diff between two versions of a semconv registry - Proposal #432

lquerel opened this issue Oct 28, 2024 · 0 comments
Assignees
Labels
enhancement New feature or request

Comments

@lquerel
Copy link
Contributor

lquerel commented Oct 28, 2024

Purpose

This issue outlines a diff process for comparing two versions of a semantic convention registry. The generated diff, in combination with the integrated template engine, will serve several purposes, including (but not limited to):

  • Generating a changelog for each new registry release
  • Updating the OpenTelemetry schema
  • Creating a migration guide
  • Producing custom database migration scripts for users leveraging semantic conventions to define their database models
  • ...

In the future, this diff output could also be integrated with the policy engine to facilitate the implementation of additional schema evolution policies.

Diff Format

The proposed diff format is defined below using a JSON schema. After computing a diff between two versions of the registry, Weaver will process the diff and selected template through the template engine to generate output in various formats, such as JSON, YAML, console (ANSI text), Markdown, and more.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Schema Changes Definition",
  "definitions": {
    "SchemaItemType": {
      "type": "string",
      "enum": ["attributes", "metrics", "events", "spans", "resources"]
    },
    "RegistryManifest": {
      "type": "object",
      "properties": {
        "semconv_version": {
          "type": "string",
          "description": "The version of the registry which will be used to define the semconv package version"
        },
        "schema_base_url": {
          "type": "string",
          "description": "The base URL where the registry's schema files are hosted"
        }
      },
      "required": ["semconv_version", "schema_base_url"]
    },
    "SchemaItemChange": {
      "type": "object",
      "oneOf": [
        {
          "type": "object",
          "properties": {
            "type": { "const": "added" },
            "name": { "type": "string" }
          },
          "required": ["type", "name"]
        },
        {
          "type": "object",
          "properties": {
            "type": { "const": "renamed_to_new" },
            "old_names": {
              "type": "array",
              "items": { "type": "string" },
              "uniqueItems": true
            },
            "new_name": { "type": "string" }
          },
          "required": ["type", "old_names", "new_name"]
        },
        {
          "type": "object",
          "properties": {
            "type": { "const": "renamed_to_existing" },
            "old_names": {
              "type": "array",
              "items": { "type": "string" },
              "uniqueItems": true
            },
            "current_name": { "type": "string" }
          },
          "required": ["type", "old_names", "current_name"]
        },
        {
          "type": "object",
          "properties": {
            "type": { "const": "deprecated" },
            "name": { "type": "string" },
            "note": { "type": "string" }
          },
          "required": ["type", "name", "note"]
        },
        {
          "type": "object",
          "properties": {
            "type": { "const": "removed" },
            "name": { "type": "string" }
          },
          "required": ["type", "name"]
        }
      ]
    }
  },
  "type": "object",
  "properties": {
    "head": {
      "$ref": "#/definitions/RegistryManifest",
      "description": "Information on the registry manifest for the most recent version of the schema"
    },
    "baseline": {
      "$ref": "#/definitions/RegistryManifest",
      "description": "Information of the registry manifest for the baseline version of the schema"
    },
    "changes": {
      "type": "object",
      "additionalProperties": {
        "type": "array",
        "items": {
          "$ref": "#/definitions/SchemaItemChange"
        }
      },
      "propertyNames": {
        "$ref": "#/definitions/SchemaItemType"
      }
    }
  },
  "required": ["head", "baseline", "changes"]
}

Diff Process

Attributes of the head and baseline registries are compared after the resolution process is applied to each registry. The process is as follows:

  1. All attributes with a non-empty deprecated field are analyzed. Older deprecated formats are automatically converted to the new format (see section on the new deprecated format). An attribute may be marked as deprecated for multiple reasons. Currently, Weaver supports the following types of deprecation:
  • renamed: a new name is assigned to an existing attribute.
  • deprecated: an existing attribute is marked as deprecated without replacement.
  1. A map new_name --> set of old attribute names is created from this analysis.
  2. Using this map, we can now distinguish between:
  • Attributes created to give a new name to an existing attribute or to unify several attributes into a single one
  • Attributes created to represent entirely new items.
  1. Any attribute in the baseline schema not present in the latest schema is considered removed.
  2. Attribute names that remain in the map (created in step 2) represent attributes present in both versions of the schema and indicate cases where attributes have been renamed to an already existing attribute.

This process applies similarly to each type of signal.

New Deprecated Format

Attributes and signals can be marked as deprecated. In the current registry version (<= 1.28), the deprecated field is a free-text string. Often, this text follows flexible conventions. To make automation more robust, this proposal defines a more structured format for the deprecated field. The deprecated fields of existing registries will be automatically converted to this new structured format (on a best-effort basis using regular expressions). The new deprecated format is as follows:

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Deprecated",
  "description": "The different ways to deprecate an attribute, a metric, ...",
  "type": "object",
  "oneOf": [
    {
      "type": "object",
      "properties": {
        "action": {
          "const": "renamed",
          "description": "The object containing the deprecated field has been renamed to an existing object or to a new object"
        },
        "new_name": {
          "type": "string",
          "description": "The new name of the field"
        },
        "note": {
          "type": ["string", "null"],
          "description": "An optional note to explain why the field has been renamed"
        },
        "version": {
          "type": "string",
          "description": "The version when this object became deprecated"
        }
      },
      "required": ["action", "new_name"],
      "additionalProperties": false
    },
    {
      "type": "object",
      "properties": {
        "action": {
          "const": "deprecated",
          "description": "The object containing the deprecated field has been deprecated either because it no longer exists, has been split into multiple fields, has been renamed in various ways across different contexts, or for any other reason"
        },
        "note": {
          "type": "string",
          "description": "A note to explain why the field has been deprecated"
        },
        "version": {
          "type": "string",
          "description": "The version when this object became deprecated"
        }
      },
      "required": ["action", "note"],
      "additionalProperties": false
    }
  ]
}

Future registries are expected to adopt this new deprecated format.

Manifest Format

A registry manifest file will be added to each new version of the semconv registry. This manifest will describe the registry’s version, schema URL, a text description, and the registry name (e.g., otel). The following JSON schema describes the format of this file. When this file is missing (i.e., for earlier versions of the registry), Weaver will infer its contents on a best-effort basis. This manifest file will play a key role in supporting multiple registries. In future versions, the manifest file may be extended with additional fields. Right now, the main information used is the semconv_version.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Registry Manifest",
  "description": "Represents the information of a semantic convention registry manifest. This information defines the registry's name, version, description, and schema base url.",
  "type": "object",
  "properties": {
    "name": {
      "type": "string",
      "description": "The name of the registry. This name is used to define the package name."
    },
    "description": {
      "type": ["string", "null"],
      "description": "An optional description of the registry. This field can be used to provide additional context or information about the registry's purpose and contents. The format of the description is markdown."
    },
    "semconv_version": {
      "type": "string",
      "description": "The version of the registry which will be used to define the semconv package version."
    },
    "schema_base_url": {
      "type": "string",
      "description": "The base URL where the registry's schema files are hosted."
    }
  },
  "required": ["name", "semconv_version", "schema_base_url"],
  "additionalProperties": false
}

Update of OTEL Schema

The update of the OTEL schema will use the information in the registry-manifest.yaml file (particularly the current version of the registry) along with the content from the registry and OTEL schema versions. An updated version of the OTEL schema will be generated as output, containing a description of all changes resulting from the diff process described in this issue.

Future Extensions

  • Add new fields to the registry_manifest.yaml file to support multi-registry use cases.
  • Enhance the diff to provide more granularity, capturing differences at the field level and in particular for the enum values.

Related Work-In-Progress PR -> #400

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Next Release
Development

No branches or pull requests

1 participant