GitHub - tontinton/toshokan: Log search engine on object storages

Introduction

toshokan is a search engine (think Elasticsearch, Splunk), but storing the data on object storage, most similar to Quickwit.

It uses:

tantivy - for building and searching the inverted index data structure.
Apache OpenDAL - for an abstraction over object storages.
PostgreSQL - for storing metadata atomically, removing data races.

I've also posted a blog post explaining the benefits and drawbacks of using an object storage for data intensive applications.

Architecture

How to use

toshokan create example_config.yaml

# Index a json file delimited by new lines.
toshokan index test ~/hdfs-logs-multitenants-10000.json

# Index json records from kafka.
# Every --commit-interval, whatever was read from the source is written to a new index file.
toshokan index test kafka://localhost:9092/topic --stream

toshokan search test "tenant_id:[60 TO 65} AND severity_text:INFO" --limit 1 | jq .
# {
#   "attributes": {
#     "class": "org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace"
#   },
#   "body": "src: /10.10.34.30:33078, dest: /10.10.34.11:50010, bytes: 234, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_-202827006_103, offset: 0, srvID: d9ef1b17-4314-4cd8-91eb-095413c3427f, blockid: BP-108841162-10.10.34.11-1440074360971:blk_1074072709_331885, duration: 2571934",
#   "resource": {
#     "service": "datanode/01"
#   },
#   "severity_text": "INFO",
#   "tenant_id": 61,
#   "timestamp": "2016-04-13T06:46:54Z"
# }

# Merge index files for faster searching.
toshokan merge test

toshokan drop test

Name	Name	Last commit message	Last commit date
Latest commit tontinton readme: Add more info Jul 27, 2024 66c17da · Jul 27, 2024 History 143 Commits
.github	.github	gh: rust: tests: Update name to unit + integration tests	Jun 29, 2024
.sqlx	.sqlx	metadata: Store index file length	Jun 29, 2024
migrations	migrations	metadata: Store index file length	Jun 29, 2024
src	src	drop: Add support for s3 removal of index files	Jul 1, 2024
tests	tests	tests: Add merge sanity test	Jun 29, 2024
.env	.env	storage: s3: Configure s3 from env vars	Jun 29, 2024
.gitignore	.gitignore	Add `.vscode` dir to `.gitignore`	May 13, 2024
Cargo.lock	Cargo.lock	tests: config: Add s3 as storage tests	Jun 29, 2024
Cargo.toml	Cargo.toml	readme: Add more info	Jul 27, 2024
LICENSE-APACHE	LICENSE-APACHE	Add Apache 2.0 + MIT licenses	Jul 27, 2024
LICENSE-MIT	LICENSE-MIT	Add Apache 2.0 + MIT licenses	Jul 27, 2024
README.md	README.md	readme: Add more info	Jul 27, 2024
architecture.svg	architecture.svg	readme: Add more info	Jul 27, 2024
example_config.yaml	example_config.yaml	Add static object fields	Jun 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Introduction

Architecture

How to use

About

Licenses found

Languages

License

tontinton/toshokan

Folders and files

Latest commit

History

Repository files navigation

Introduction

Architecture

How to use

About

Topics

Resources

License

Stars

Watchers

Forks

Languages