Skip to content

Commit 79134d6

Browse files
authored
docs: Clarifying info about structured metadata, blooms (#15058) (#15061)
1 parent 6bb2305 commit 79134d6

File tree

2 files changed

+10
-9
lines changed

2 files changed

+10
-9
lines changed

docs/sources/get-started/labels/structured-metadata.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,8 @@ You should only use structured metadata in the following situations:
2323

2424
- If you are ingesting data in OpenTelemetry format, using Grafana Alloy or an OpenTelemetry Collector. Structured metadata was designed to support native ingestion of OpenTelemetry data.
2525
- If you have high cardinality metadata that should not be used as a label and does not exist in the log line. Some examples might include `process_id` or `thread_id` or Kubernetes pod names.
26-
- If you are using [Explore Logs](https://grafana.com/docs/grafana-cloud/visualizations/simplified-exploration/logs/) to visualize and explore your Loki logs.
27-
- If you are a large-scale customer, who is ingesting more than 75TB of logs a month and are using [Bloom filters](https://grafana.com/docs/loki/<LOKI_VERSION>/operations/bloom-filters/)
26+
- If you are using [Explore Logs](https://grafana.com/docs/grafana-cloud/visualizations/simplified-exploration/logs/) to visualize and explore your Loki logs. You must set `discover_log_levels` and `allow_structured_metadata` to `true` in your Loki configuration.
27+
- If you are a large-scale customer, who is ingesting more than 75TB of logs a month and are using [Bloom filters](https://grafana.com/docs/loki/<LOKI_VERSION>/operations/bloom-filters/) (Experimental), starting in [Loki 3.3](https://grafana.com/docs/loki/<LOKI_VERSION>/release-notes/v3-3/) Bloom filters now utilize structured metadata.
2828

2929
We do not recommend extracting information that already exists in your log lines and putting it into structured metadata.
3030

docs/sources/operations/bloom-filters.md

+8-7
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,10 @@ aliases:
1212

1313
# Bloom filters (Experimental)
1414

15-
{{% admonition type="warning" %}}
16-
This feature is an [experimental feature](/docs/release-life-cycle/). Engineering and on-call support is not available. No SLA is provided.
17-
{{% /admonition %}}
15+
{{< admonition type="warning" >}}
16+
This feature is an [experimental feature](/docs/release-life-cycle/). Engineering and on-call support is not available. No SLA is provided.
17+
Note that this feature is intended for users who are ingesting more than 75TB of logs a month, as it is designed to accelerate queries against large volumes of logs.
18+
{{< /admonition >}}
1819

1920
Loki leverages [bloom filters](https://en.wikipedia.org/wiki/Bloom_filter) to speed up queries by reducing the amount of data Loki needs to load from the store and iterate through.
2021
Loki is often used to run "needle in a haystack" queries; these are queries where a large number of log lines are searched, but only a few log lines match the query.
@@ -110,7 +111,7 @@ overrides:
110111
period: 40d
111112
```
112113

113-
### Sizing and configuration
114+
### Planner and Builder sizing and configuration
114115

115116
The single planner instance runs the planning phase for bloom blocks for each tenant in the given interval and puts the created tasks to an internal task queue.
116117
Builders process tasks sequentially by pulling them from the queue. The amount of builder replicas required to complete all pending tasks before the next planning iteration depends on the value of `-bloom-build.planner.bloom_split_series_keyspace_by`, the number of tenants, and the log volume of the streams.
@@ -131,7 +132,7 @@ The sharding of the data is performed on the client side using DNS discovery of
131132
You can find all the configuration options for this component in the Configure section for the [Bloom Gateways][bloom-gateway-cfg].
132133
Refer to the [Enable bloom filters](#enable-bloom-filters) section above for a configuration snippet enabling this feature.
133134

134-
### Sizing and configuration
135+
### Gateway sizing and configuration
135136

136137
Bloom Gateways use their local file system as a Least Recently Used (LRU) cache for blooms that are downloaded from object storage.
137138
The size of the blooms depend on the ingest volume and number of unique structured metadata key-value pairs, as well as on build settings of the blooms, namely false-positive-rate.
@@ -140,7 +141,7 @@ With default settings, bloom filters make up <1% of the raw structured metadata
140141
Since reading blooms depends heavily on disk IOPS, Bloom Gateways should make use of multiple, locally attached SSD disks (NVMe) to increase I/O throughput.
141142
Multiple directories on different disk mounts can be specified using the `-bloom.shipper.working-directory` [setting][storage-config-cfg] when using a comma separated list of mount points, for example:
142143

143-
```
144+
```yaml
144145
-bloom.shipper.working-directory="/mnt/data0,/mnt/data1,/mnt/data2,/mnt/data3"
145146
```
146147

@@ -150,7 +151,7 @@ The product of three settings control the maximum amount of bloom data in memory
150151

151152
Example, assuming 4 CPU cores:
152153

153-
```
154+
```yaml
154155
-bloom-gateway.worker-concurrency=4 // 1x NUM_CORES
155156
-bloom-gateway.block-query-concurrency=8 // 2x NUM_CORES
156157
-bloom.max-query-page-size=64MiB

0 commit comments

Comments
 (0)