forked from rackerlabs/genestack
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: add metering and billing information (rackerlabs#485)
* docs: add metering and billing information * docs(metering): add high level overview diagram This depicts ceilometer collecting and persisting data to Gnocchi, as well as clients consuming data from the Metric API. * docs(metering): beging to add billing info Begin to stub other interesting sections in Gnocchi to back-fill. * docs(metering): more gnocchi documentation * docs(metering): cite diagram sources * docs(metering): begin to add metrics cli usage * docs(metering): add resource defs in ceilometer * docs(metering): describe billing and chargebacks * docs(metering): fix metric resource cli cmds * docs(metering): fix typo in example command * docs(metering): fix trailing whitespaces also fix a typo in the python-gnocchiclient url add new line to end of metering-overview svg
- Loading branch information
Showing
9 changed files
with
645 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
# Billing Design | ||
|
||
In a cloud billing system using Gnocchi as the source for metered usage data, | ||
Gnocchi stores and processes time series data related to resource consumption. | ||
Key factors such as instance flavor, volume size and type, network traffic, and | ||
object storage can all be stored in Gnocchi, enabling them to be queried later | ||
for usage-based billing of Genestack tenants. | ||
|
||
## Billing Workflow | ||
|
||
1. **Data Collection**: OpenStack Ceilometer continuously collects telemetry | ||
data from various cloud resources via polling and notification agents. | ||
|
||
2. **Data Aggregation and Storage**: Ceilometer forwards this raw usage data | ||
to Gnocchi. Gnocchi automatically aggregates and stores these metrics in an | ||
optimized, scalable format — ensuring that large volumes of data can be | ||
handled efficiently. | ||
|
||
3. **Querying Usage Data**: The billing system queries the Metrics API to | ||
retrieve pre-aggregated metrics over specified time periods (_e.g., hourly, | ||
daily, or monthly usage_). Gnocchi provides quick access to the stored data, | ||
enabling near real-time billing operations. | ||
|
||
4. **Converting to Atom Events**: The billing system converts the collated | ||
resource usage data into Atom events before submitting them. | ||
|
||
5. **Submitting Events to Cloud Feeds**: Newly created Atom events are sent | ||
via HTTPS to Cloud Feeds. | ||
|
||
6. **Usage Mediation Services**: Our UMS team receives the metered usage | ||
events from the named feed, then does further aggregation before emitting | ||
the usage to be invoiced. | ||
|
||
7. **Billing and Revenue Management**: Finally, the aggregated usage from | ||
UMS is received and processed by BRM to create the usage-based invoice | ||
for each tenant. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,160 @@ | ||
# Ceilometer (Metering and Event Collection) | ||
|
||
Ceilometer is the telemetry service in OpenStack responsible for collecting | ||
usage data related to different resources (_e.g., instances, volumes, | ||
and network usage_). It compiles various types of metrics (_referred to as | ||
meters_), such as CPU utilization, disk I/O, and network traffic. It does | ||
this by gathering data from other OpenStack components like Nova (_compute_), | ||
Cinder (_block storage_), and Neutron (_networking_). It also captures event | ||
data such as instance creation and volume attachment via hooks into the message | ||
notification system (_RabbitMQ_). | ||
|
||
![Ceilometer Architecture](assets/images/metering-ceilometer.png) | ||
|
||
<figure> | ||
<figcaption>Image source: <a href="https://docs.openstack.org/ceilometer/latest/contributor/architecture.html" target="_blank" rel="noopener noreferrer">docs.openstack.org</a></figcaption> | ||
</figure> | ||
|
||
## Configuration | ||
|
||
Ceilometer’s configuration may initially seem complex due to the extensive | ||
number of event, metric, and resource definitions available. However, these | ||
definitions can be easily modified to adjust the data collected by the polling | ||
and notification agents, allowing users to fine-tune data collection based on | ||
their specific needs. | ||
|
||
### Events | ||
|
||
Events are discrete occurrences, such as the starting or stopping of | ||
instances or attaching a volume which are captured and stored. Ceilometer | ||
builds event data from the messages it receives from other OpenStack | ||
services. Event definitions can be complex. Typically, a given message will | ||
match one or more event definitions that describe what the incoming payload | ||
should be flattened to. See the [telemetry-events][ceilometer-events] | ||
section of Ceilometer's documentation for more information. | ||
|
||
??? example "Example event definitions for cinder volumes" | ||
|
||
``` | ||
- event_type: ['volume.exists', 'volume.retype', 'volume.create.*', 'volume.delete.*', 'volume.resize.*', 'volume.attach.*', 'volume.detach.*', 'volume.update.*', 'snapshot.exists', 'snapshot.create.*', 'snapshot.delete.*', 'snapshot.update.*', 'volume.transfer.accept.end', 'snapshot.transfer.accept.end'] | ||
traits: &cinder_traits | ||
user_id: | ||
fields: payload.user_id | ||
project_id: | ||
fields: payload.tenant_id | ||
availability_zone: | ||
fields: payload.availability_zone | ||
display_name: | ||
fields: payload.display_name | ||
replication_status: | ||
fields: payload.replication_status | ||
status: | ||
fields: payload.status | ||
created_at: | ||
type: datetime | ||
fields: payload.created_at | ||
image_id: | ||
fields: payload.glance_metadata[?key=image_id].value | ||
instance_id: | ||
fields: payload.volume_attachment[0].instance_uuid | ||
- event_type: ['volume.transfer.*', 'volume.exists', 'volume.retype', 'volume.create.*', 'volume.delete.*', 'volume.resize.*', 'volume.attach.*', 'volume.detach.*', 'volume.update.*', 'snapshot.transfer.accept.end'] | ||
traits: | ||
<<: *cinder_traits | ||
resource_id: | ||
fields: payload.volume_id | ||
host: | ||
fields: payload.host | ||
size: | ||
type: int | ||
fields: payload.size | ||
type: | ||
fields: payload.volume_type | ||
replication_status: | ||
fields: payload.replication_status | ||
``` | ||
|
||
### Resources | ||
|
||
Gnocchi resource definitions in Ceilometer's configuration define how resources | ||
like instances, volumes, and networks are represented and tracked for | ||
telemetry purposes. Each definition specifies the attributes (_such as project | ||
ID or instance name_) and the metrics (_like CPU usage or network traffic_) | ||
associated with that resource. When Ceilometer collects data from various | ||
OpenStack services, it uses these definitions to map the data to the appropriate | ||
resource type in Gnocchi (_which stores it as time-series data_). This | ||
structure allows for efficient monitoring, aggregation, and analysis of resource | ||
usage over time in a scalable way. | ||
|
||
??? example "Example resource definition for cinder volumes" | ||
|
||
``` | ||
- resource_type: volume | ||
metrics: | ||
volume: | ||
volume.size: | ||
snapshot.size: | ||
volume.snapshot.size: | ||
volume.backup.size: | ||
backup.size: | ||
volume.manage_existing.start: | ||
volume.manage_existing.end: | ||
volume.manage_existing_snapshot.start: | ||
volume.manage_existing_snapshot.end: | ||
attributes: | ||
display_name: resource_metadata.(display_name|name) | ||
volume_type: resource_metadata.volume_type | ||
image_id: resource_metadata.image_id | ||
instance_id: resource_metadata.instance_id | ||
event_create: | ||
- volume.create.end | ||
event_delete: | ||
- volume.delete.end | ||
- snapshot.delete.end | ||
event_update: | ||
- volume.attach.end | ||
- volume.transfer.accept.end | ||
- snapshot.transfer.accept.end | ||
event_attributes: | ||
id: resource_id | ||
project_id: project_id | ||
image_id: image_id | ||
instance_id: instance_id | ||
``` | ||
|
||
### Meters | ||
|
||
Meters are quantitative measures like CPU time, memory usage, or disk | ||
operations. Ceilometer provides several useful metrics by default, but new | ||
definitions can be added to suit almost every need. To read more about | ||
measurements and how they are captured, see the [telemetry-measurements][ceilometer-telemetry] | ||
section of Ceilometer documentation. | ||
|
||
??? example "Example metric definition for volume.size" | ||
``` | ||
- name: 'volume.size' | ||
event_type: | ||
- 'volume.exists' | ||
- 'volume.retype' | ||
- 'volume.create.*' | ||
- 'volume.delete.*' | ||
- 'volume.resize.*' | ||
- 'volume.attach.*' | ||
- 'volume.detach.*' | ||
- 'volume.update.*' | ||
- 'volume.manage.*' | ||
type: 'gauge' | ||
unit: 'GB' | ||
volume: $.payload.size | ||
user_id: $.payload.user_id | ||
project_id: $.payload.tenant_id | ||
resource_id: $.payload.volume_id | ||
metadata: | ||
display_name: $.payload.display_name | ||
volume_type: $.payload.volume_type | ||
image_id: $.payload.glance_metadata[?key=image_id].value | ||
instance_id: $.payload.volume_attachment[0].instance_uuid | ||
``` | ||
|
||
[ceilometer-telemetry]: https://docs.openstack.org/ceilometer/latest/admin/telemetry-measurements.html "The Telemetry service collects meters within an OpenStack deployment. This section provides a brief summary about meters format, their origin, and also contains the list of available meters." | ||
|
||
[ceilometer-events]: https://docs.openstack.org/ceilometer/latest/admin/telemetry-events.html "In addition to meters, the Telemetry service collects events triggered within an OpenStack environment. This section provides a brief summary of the events format in the Telemetry service." |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
# Handling Chargebacks | ||
|
||
Gnocchi is pivotal in tracking and managing resource consumption across projects | ||
within an OpenStack environment. The chargeback process aims to assign the | ||
costs of shared cloud resources to the responsible entity based on their usage. | ||
|
||
## Theoretical Workflow | ||
|
||
1. **Customer Initiates Chargeback or Complaint**: The complaint is received | ||
by the responsible operational team that would handle such a dispute. Usage | ||
can be re-calculated for a specific tenant over a given period of time. | ||
|
||
2. **Querying Usage Data**: The chargeback system queries Gnocchi for usage | ||
metrics that belong only to the specific projects of concern related to the | ||
dispute. Gnocchi provides detailed, pre-aggregated data for each tracked | ||
resource, enabling the system to quickly access and analyze consumption. | ||
|
||
3. **Cost Allocation**: Based on the usage data retrieved from Gnocchi, the | ||
chargeback system could then allocate the costs of the shared cloud | ||
resources to each tenant. Cost allocation models, such as pay-per-use or | ||
fixed rates for specific services (_e.g., $ per GB of storage or flavor_type | ||
$ per hour_), can be applied to determine the charges for each entity. | ||
|
||
4. **Reporting and Transparency**: The chargeback system could be made to | ||
generate reports detailing each project's resource consumption and | ||
associated costs. These reports provide transparency, allowing tenants to | ||
track their resource usage and associated expenses. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
# Gnocchi (Metric Storage API) | ||
|
||
Gnocchi is an open-source project designed to store and manage time series data. | ||
|
||
It addresses the challenge of efficiently storing and indexing large-scale time | ||
series data, which is crucial in modern cloud environments that are vast, | ||
dynamic, and may serve multiple users. Gnocchi is built with performance, | ||
scalability, and fault-tolerance in mind, without relying on complex storage | ||
systems. | ||
|
||
Unlike traditional time series databases that store raw data points and compute | ||
aggregates (_like averages or minimums_) when queried, Gnocchi simplifies this | ||
by pre-aggregating data during ingestion. This makes retrieving data much | ||
faster since the system only needs to read the already processed results. | ||
|
||
## Architecture | ||
|
||
Gnocchi includes multiple services: an HTTP REST API, an optional | ||
statsd-compatible daemon, and an asynchronous processing daemon | ||
(_gnocchi-metricd_). Data is ingested through the API or statsd daemon, | ||
while `gnocchi-metricd` handles background tasks like statistics computation and | ||
metric cleanup. | ||
|
||
![Gnocchi Architecture](assets/images/gnocchi-architecture.svg) | ||
|
||
<figure> | ||
<figcaption>Image source: <a href="https://gnocchi.osci.io/intro.html" target="_blank" rel="noopener noreferrer">gnocchi.osci.io</a></figcaption> | ||
</figure> | ||
|
||
Gnocchi services are stateless thus can be scaled horizontally without much | ||
effort. That being said, we can easily define an HPA (HorizontalPodAutoscaler) | ||
policy to do just that for `ReplicaSet` components such as the `gnocchi-api`. | ||
However, `metricd` and `statsd` components are configured to be | ||
`DaemonSets`, so operators need only label additional nodes with the | ||
configured node-selector key/value of `openstack-control-plane=enabled` to | ||
scale those components up or down. | ||
|
||
## Storage | ||
|
||
As shown in the previous architecture diagram, Gnocchi relies on three key | ||
external components for proper functionality: | ||
|
||
- Storage for incoming measures | ||
- Storage for aggregated metrics | ||
- An index | ||
|
||
### Measures & Aggregates | ||
|
||
Gnocchi supports various storage backends for incoming measures and aggregated | ||
metrics, including: | ||
|
||
- File | ||
- Ceph (_flex default for `incoming` & `storage`_) | ||
- OpenStack Swift | ||
- Amazon S3 | ||
- Redis | ||
|
||
For smaller architectures, using the file driver to store data on disk may be | ||
sufficient. However, S3, Ceph, and Swift offer more scalable storage options, | ||
with Ceph being the recommended choice due to its better consistency. In | ||
larger or busier deployments, a common recommendation is to use Redis for | ||
incoming measure storage and Ceph for aggregate storage. | ||
|
||
### Indexing | ||
|
||
The indexer driver stores the index of all resources, archive policies, and | ||
metrics, along with their definitions, types, and properties. It also handles | ||
the linking of resources to metrics and manages resource relationships. | ||
Supported drivers include the following: | ||
|
||
- PostgreSQL (_flex default_) | ||
- MySQL (_version 5.6.4 or higher_) | ||
|
||
## Resource Types | ||
|
||
The resource types that reside within Gnocchi are created during the Ceilometer | ||
db-sync job which executes `ceilometer-upgrade`. We create the default types | ||
that ship with Ceilometer, they can be modified via the Metrics API post | ||
creation if necessary. | ||
|
||
## REST API Usage | ||
|
||
The Gnocchi REST API is well documented on their website, please see the | ||
[REST API Usage](https://gnocchi.osci.io/rest.html) section for full detail. | ||
Furthermore, there is a community supported Python client and SDK | ||
installable via pip, aptly named [python-gnocchiclient](https://github.com/gnocchixyz/python-gnocchiclient). | ||
It's worth noting, this is a required module for `openstack metric` commands | ||
to function. See [OpenStack Metrics](openstack-metrics.md) for example CLI | ||
usage. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Metering Overview | ||
|
||
Metering in OpenStack involves collecting, tracking, and analyzing the | ||
usage data of various resource types within your cloud environment (_crucial | ||
for billing, monitoring, and performance optimization_). This functionality | ||
is achieved by leveraging the [Ceilometer](metering-ceilometer.md) and | ||
[Gnocchi](metering-gnocchi.md) projects. | ||
|
||
Ceilometer and Gnocchi work together to provide a powerful solution for | ||
resource tracking in environments of all sizes. Their combined importance | ||
lies in their complementary roles such as collecting, storing, and | ||
processing of telemetry data at scale. | ||
|
||
Once processed and stored, these resource data can be queried through Gnocchi, | ||
also known as the Metrics API. This data serves a wide range of use cases, | ||
including auditing, billing, monitoring, and more. | ||
|
||
![Metering Overview](assets/images/metering-overview.svg) | ||
|
||
<figure> | ||
<figcaption>Metering Architecture - © Luke Repko, Rackspace Technology</figcaption> | ||
</figure> |
Oops, something went wrong.