Snowplow Databricks Loader

This project contains applications required to load Snowplow data into Databricks with low latency.

Check out the example config files for how to configure your loader.

Step 1: Run the loader

The Databricks loader reads the stream of enriched events and pushes staging files to a Databricks volume

Basic usage: `

docker run \
  -v /path/to/config.hocon:/var/config.hocon \
  snowplow/databricks-loader-<flavour>:0.2.0 \
  --config=/var/config.hocon \
  --iglu-config=/var/iglu.hocon

...where <flavour> is either kinesis (for AWS), pubsub (for GCP) or kafka (for Azure).

Step 2: Run a Databricks Lakeflow Declarative Pipeline

Create a Pipeline in your Databricks workspace and and copy the following SQL into the associated .sql file:

CREATE STREAMING LIVE TABLE events
CLUSTER BY (load_tstamp, event_name)
TBLPROPERTIES (
  'delta.dataSkippingStatsColumns' =
      'load_tstamp,collector_tstamp,derived_tstamp,dvce_created_tstamp,true_tstamp,event_name'
)
AS SELECT
  *,
  current_timestamp() as load_tstamp
FROM cloud_files(
  "/Volumes/<CATALOG_NAME>/<VOLUME_NAME>/<SCHEMA_NAME>/events",
  "parquet",
  map(
    "cloudfiles.inferColumnTypes", "false",
    "cloudfiles.includeExistingFiles", "false", -- set to true to load files already present in the volume
    "cloudfiles.schemaEvolutionMode", "addNewColumns",
    "cloudfiles.partitionColumns", "",
    "cloudfiles.useManagedFileEvents", "true",
    "datetimeRebaseMode", "CORRECTED",
    "int96RebaseMode", "CORRECTED",
    "mergeSchema", "true"
  )
)

Replace /Volumes/<CATALOG_NAME>/<VOLUME_NAME>/<SCHEMA_NAME>/events with the correct path to your volume.

Find out more

Technical Docs	Setup Guide	Roadmap & Contributing

Technical Docs	Setup Guide	Roadmap

Copyright and License

Licensed under the Snowplow Limited Use License Agreement. (If you are uncertain how it applies to your use case, check our answers to frequently asked questions.)

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/workflows		.github/workflows
config		config
modules		modules
project		project
.gitignore		.gitignore
.scalafmt.conf		.scalafmt.conf
CHANGELOG		CHANGELOG
LICENSE.md		LICENSE.md
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Snowplow Databricks Loader

Step 1: Run the loader

Step 2: Run a Databricks Lakeflow Declarative Pipeline

Find out more

Copyright and License

About

Uh oh!

Releases 2

Packages

Contributors 2

Uh oh!

Languages

License

snowplow-incubator/snowplow-databricks-loader

Folders and files

Latest commit

History

Repository files navigation

Snowplow Databricks Loader

Step 1: Run the loader

Step 2: Run a Databricks Lakeflow Declarative Pipeline

Find out more

Copyright and License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Contributors 2

Uh oh!

Languages

Packages