Skip to content

Commit

Permalink
docs: add metering and billing information (rackerlabs#485)
Browse files Browse the repository at this point in the history
* docs: add metering and billing information

* docs(metering): add high level overview diagram

This depicts ceilometer collecting and persisting data to Gnocchi, as
well as clients consuming data from the Metric API.

* docs(metering): beging to add billing info

Begin to stub other interesting sections in Gnocchi to back-fill.

* docs(metering): more gnocchi documentation

* docs(metering): cite diagram sources

* docs(metering): begin to add metrics cli usage

* docs(metering): add resource defs in ceilometer

* docs(metering): describe billing and chargebacks

* docs(metering): fix metric resource cli cmds

* docs(metering): fix typo in example command

* docs(metering): fix trailing whitespaces

also fix a typo in the python-gnocchiclient url
add new line to end of metering-overview svg
  • Loading branch information
LukeRepko authored Oct 21, 2024
1 parent 25e0c6b commit dc5a660
Show file tree
Hide file tree
Showing 9 changed files with 645 additions and 0 deletions.
Binary file added docs/assets/images/metering-ceilometer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
3 changes: 3 additions & 0 deletions docs/assets/images/metering-overview.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
36 changes: 36 additions & 0 deletions docs/metering-billing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Billing Design

In a cloud billing system using Gnocchi as the source for metered usage data,
Gnocchi stores and processes time series data related to resource consumption.
Key factors such as instance flavor, volume size and type, network traffic, and
object storage can all be stored in Gnocchi, enabling them to be queried later
for usage-based billing of Genestack tenants.

## Billing Workflow

1. **Data Collection**: OpenStack Ceilometer continuously collects telemetry
data from various cloud resources via polling and notification agents.

2. **Data Aggregation and Storage**: Ceilometer forwards this raw usage data
to Gnocchi. Gnocchi automatically aggregates and stores these metrics in an
optimized, scalable format — ensuring that large volumes of data can be
handled efficiently.

3. **Querying Usage Data**: The billing system queries the Metrics API to
retrieve pre-aggregated metrics over specified time periods (_e.g., hourly,
daily, or monthly usage_). Gnocchi provides quick access to the stored data,
enabling near real-time billing operations.

4. **Converting to Atom Events**: The billing system converts the collated
resource usage data into Atom events before submitting them.

5. **Submitting Events to Cloud Feeds**: Newly created Atom events are sent
via HTTPS to Cloud Feeds.

6. **Usage Mediation Services**: Our UMS team receives the metered usage
events from the named feed, then does further aggregation before emitting
the usage to be invoiced.

7. **Billing and Revenue Management**: Finally, the aggregated usage from
UMS is received and processed by BRM to create the usage-based invoice
for each tenant.
160 changes: 160 additions & 0 deletions docs/metering-ceilometer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# Ceilometer (Metering and Event Collection)

Ceilometer is the telemetry service in OpenStack responsible for collecting
usage data related to different resources (_e.g., instances, volumes,
and network usage_). It compiles various types of metrics (_referred to as
meters_), such as CPU utilization, disk I/O, and network traffic. It does
this by gathering data from other OpenStack components like Nova (_compute_),
Cinder (_block storage_), and Neutron (_networking_). It also captures event
data such as instance creation and volume attachment via hooks into the message
notification system (_RabbitMQ_).

![Ceilometer Architecture](assets/images/metering-ceilometer.png)

<figure>
<figcaption>Image source: <a href="https://docs.openstack.org/ceilometer/latest/contributor/architecture.html" target="_blank" rel="noopener noreferrer">docs.openstack.org</a></figcaption>
</figure>

## Configuration

Ceilometer’s configuration may initially seem complex due to the extensive
number of event, metric, and resource definitions available. However, these
definitions can be easily modified to adjust the data collected by the polling
and notification agents, allowing users to fine-tune data collection based on
their specific needs.

### Events

Events are discrete occurrences, such as the starting or stopping of
instances or attaching a volume which are captured and stored. Ceilometer
builds event data from the messages it receives from other OpenStack
services. Event definitions can be complex. Typically, a given message will
match one or more event definitions that describe what the incoming payload
should be flattened to. See the [telemetry-events][ceilometer-events]
section of Ceilometer's documentation for more information.

??? example "Example event definitions for cinder volumes"

```
- event_type: ['volume.exists', 'volume.retype', 'volume.create.*', 'volume.delete.*', 'volume.resize.*', 'volume.attach.*', 'volume.detach.*', 'volume.update.*', 'snapshot.exists', 'snapshot.create.*', 'snapshot.delete.*', 'snapshot.update.*', 'volume.transfer.accept.end', 'snapshot.transfer.accept.end']
traits: &cinder_traits
user_id:
fields: payload.user_id
project_id:
fields: payload.tenant_id
availability_zone:
fields: payload.availability_zone
display_name:
fields: payload.display_name
replication_status:
fields: payload.replication_status
status:
fields: payload.status
created_at:
type: datetime
fields: payload.created_at
image_id:
fields: payload.glance_metadata[?key=image_id].value
instance_id:
fields: payload.volume_attachment[0].instance_uuid
- event_type: ['volume.transfer.*', 'volume.exists', 'volume.retype', 'volume.create.*', 'volume.delete.*', 'volume.resize.*', 'volume.attach.*', 'volume.detach.*', 'volume.update.*', 'snapshot.transfer.accept.end']
traits:
<<: *cinder_traits
resource_id:
fields: payload.volume_id
host:
fields: payload.host
size:
type: int
fields: payload.size
type:
fields: payload.volume_type
replication_status:
fields: payload.replication_status
```

### Resources

Gnocchi resource definitions in Ceilometer's configuration define how resources
like instances, volumes, and networks are represented and tracked for
telemetry purposes. Each definition specifies the attributes (_such as project
ID or instance name_) and the metrics (_like CPU usage or network traffic_)
associated with that resource. When Ceilometer collects data from various
OpenStack services, it uses these definitions to map the data to the appropriate
resource type in Gnocchi (_which stores it as time-series data_). This
structure allows for efficient monitoring, aggregation, and analysis of resource
usage over time in a scalable way.

??? example "Example resource definition for cinder volumes"

```
- resource_type: volume
metrics:
volume:
volume.size:
snapshot.size:
volume.snapshot.size:
volume.backup.size:
backup.size:
volume.manage_existing.start:
volume.manage_existing.end:
volume.manage_existing_snapshot.start:
volume.manage_existing_snapshot.end:
attributes:
display_name: resource_metadata.(display_name|name)
volume_type: resource_metadata.volume_type
image_id: resource_metadata.image_id
instance_id: resource_metadata.instance_id
event_create:
- volume.create.end
event_delete:
- volume.delete.end
- snapshot.delete.end
event_update:
- volume.attach.end
- volume.transfer.accept.end
- snapshot.transfer.accept.end
event_attributes:
id: resource_id
project_id: project_id
image_id: image_id
instance_id: instance_id
```

### Meters

Meters are quantitative measures like CPU time, memory usage, or disk
operations. Ceilometer provides several useful metrics by default, but new
definitions can be added to suit almost every need. To read more about
measurements and how they are captured, see the [telemetry-measurements][ceilometer-telemetry]
section of Ceilometer documentation.

??? example "Example metric definition for volume.size"
```
- name: 'volume.size'
event_type:
- 'volume.exists'
- 'volume.retype'
- 'volume.create.*'
- 'volume.delete.*'
- 'volume.resize.*'
- 'volume.attach.*'
- 'volume.detach.*'
- 'volume.update.*'
- 'volume.manage.*'
type: 'gauge'
unit: 'GB'
volume: $.payload.size
user_id: $.payload.user_id
project_id: $.payload.tenant_id
resource_id: $.payload.volume_id
metadata:
display_name: $.payload.display_name
volume_type: $.payload.volume_type
image_id: $.payload.glance_metadata[?key=image_id].value
instance_id: $.payload.volume_attachment[0].instance_uuid
```

[ceilometer-telemetry]: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html "The Telemetry service collects meters within an OpenStack deployment. This section provides a brief summary about meters format, their origin, and also contains the list of available meters."

[ceilometer-events]: https://docs.openstack.org/ceilometer/latest/admin/telemetry-events.html "In addition to meters, the Telemetry service collects events triggered within an OpenStack environment. This section provides a brief summary of the events format in the Telemetry service."
27 changes: 27 additions & 0 deletions docs/metering-chargebacks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Handling Chargebacks

Gnocchi is pivotal in tracking and managing resource consumption across projects
within an OpenStack environment. The chargeback process aims to assign the
costs of shared cloud resources to the responsible entity based on their usage.

## Theoretical Workflow

1. **Customer Initiates Chargeback or Complaint**: The complaint is received
by the responsible operational team that would handle such a dispute. Usage
can be re-calculated for a specific tenant over a given period of time.

2. **Querying Usage Data**: The chargeback system queries Gnocchi for usage
metrics that belong only to the specific projects of concern related to the
dispute. Gnocchi provides detailed, pre-aggregated data for each tracked
resource, enabling the system to quickly access and analyze consumption.

3. **Cost Allocation**: Based on the usage data retrieved from Gnocchi, the
chargeback system could then allocate the costs of the shared cloud
resources to each tenant. Cost allocation models, such as pay-per-use or
fixed rates for specific services (_e.g., $ per GB of storage or flavor_type
$ per hour_), can be applied to determine the charges for each entity.

4. **Reporting and Transparency**: The chargeback system could be made to
generate reports detailing each project's resource consumption and
associated costs. These reports provide transparency, allowing tenants to
track their resource usage and associated expenses.
89 changes: 89 additions & 0 deletions docs/metering-gnocchi.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Gnocchi (Metric Storage API)

Gnocchi is an open-source project designed to store and manage time series data.

It addresses the challenge of efficiently storing and indexing large-scale time
series data, which is crucial in modern cloud environments that are vast,
dynamic, and may serve multiple users. Gnocchi is built with performance,
scalability, and fault-tolerance in mind, without relying on complex storage
systems.

Unlike traditional time series databases that store raw data points and compute
aggregates (_like averages or minimums_) when queried, Gnocchi simplifies this
by pre-aggregating data during ingestion. This makes retrieving data much
faster since the system only needs to read the already processed results.

## Architecture

Gnocchi includes multiple services: an HTTP REST API, an optional
statsd-compatible daemon, and an asynchronous processing daemon
(_gnocchi-metricd_). Data is ingested through the API or statsd daemon,
while `gnocchi-metricd` handles background tasks like statistics computation and
metric cleanup.

![Gnocchi Architecture](assets/images/gnocchi-architecture.svg)

<figure>
<figcaption>Image source: <a href="https://gnocchi.osci.io/intro.html" target="_blank" rel="noopener noreferrer">gnocchi.osci.io</a></figcaption>
</figure>

Gnocchi services are stateless thus can be scaled horizontally without much
effort. That being said, we can easily define an HPA (HorizontalPodAutoscaler)
policy to do just that for `ReplicaSet` components such as the `gnocchi-api`.
However, `metricd` and `statsd` components are configured to be
`DaemonSets`, so operators need only label additional nodes with the
configured node-selector key/value of `openstack-control-plane=enabled` to
scale those components up or down.

## Storage

As shown in the previous architecture diagram, Gnocchi relies on three key
external components for proper functionality:

- Storage for incoming measures
- Storage for aggregated metrics
- An index

### Measures & Aggregates

Gnocchi supports various storage backends for incoming measures and aggregated
metrics, including:

- File
- Ceph (_flex default for `incoming` & `storage`_)
- OpenStack Swift
- Amazon S3
- Redis

For smaller architectures, using the file driver to store data on disk may be
sufficient. However, S3, Ceph, and Swift offer more scalable storage options,
with Ceph being the recommended choice due to its better consistency. In
larger or busier deployments, a common recommendation is to use Redis for
incoming measure storage and Ceph for aggregate storage.

### Indexing

The indexer driver stores the index of all resources, archive policies, and
metrics, along with their definitions, types, and properties. It also handles
the linking of resources to metrics and manages resource relationships.
Supported drivers include the following:

- PostgreSQL (_flex default_)
- MySQL (_version 5.6.4 or higher_)

## Resource Types

The resource types that reside within Gnocchi are created during the Ceilometer
db-sync job which executes `ceilometer-upgrade`. We create the default types
that ship with Ceilometer, they can be modified via the Metrics API post
creation if necessary.

## REST API Usage

The Gnocchi REST API is well documented on their website, please see the
[REST API Usage](https://gnocchi.osci.io/rest.html) section for full detail.
Furthermore, there is a community supported Python client and SDK
installable via pip, aptly named [python-gnocchiclient](https://github.com/gnocchixyz/python-gnocchiclient).
It's worth noting, this is a required module for `openstack metric` commands
to function. See [OpenStack Metrics](openstack-metrics.md) for example CLI
usage.
22 changes: 22 additions & 0 deletions docs/metering-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Metering Overview

Metering in OpenStack involves collecting, tracking, and analyzing the
usage data of various resource types within your cloud environment (_crucial
for billing, monitoring, and performance optimization_). This functionality
is achieved by leveraging the [Ceilometer](metering-ceilometer.md) and
[Gnocchi](metering-gnocchi.md) projects.

Ceilometer and Gnocchi work together to provide a powerful solution for
resource tracking in environments of all sizes. Their combined importance
lies in their complementary roles such as collecting, storing, and
processing of telemetry data at scale.

Once processed and stored, these resource data can be queried through Gnocchi,
also known as the Metrics API. This data serves a wide range of use cases,
including auditing, billing, monitoring, and more.

![Metering Overview](assets/images/metering-overview.svg)

<figure>
<figcaption>Metering Architecture - © Luke Repko, Rackspace Technology</figcaption>
</figure>
Loading

0 comments on commit dc5a660

Please sign in to comment.