Build data pipelines, the easy way 🛠️
-
Updated
Jun 6, 2023 - TypeScript
Build data pipelines, the easy way 🛠️
Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents
Hamilton helps data scientists and engineers define testable, modular, self-documenting dataflows, that encode lineage/tracing and metadata. Runs and scales everywhere python does.
Implementing best practices for PySpark ETL jobs and applications.
Few projects related to Data Engineering including Data Modeling, Infrastructure setup on cloud, Data Warehousing and Data Lake development.
An end-to-end GoodReads Data Pipeline for Building Data Lake, Data Warehouse and Analytics Platform.
A scalable general purpose micro-framework for defining dataflows. THIS REPOSITORY HAS BEEN MOVED TO www.github.com/dagworks-inc/hamilton
A Clojure high performance data processing system
A simplified, lightweight ETL Framework based on Apache Spark
A blazingly fast general purpose blockchain analytics engine specialized in systematic mev detection
An end-to-end data engineering pipeline that orchestrates data ingestion, processing, and storage using Apache Airflow, Python, Apache Kafka, Apache Zookeeper, Apache Spark, and Cassandra. All components are containerized with Docker for easy deployment and scalability.
A simple Spark-powered ETL framework that just works 🍺
[Python] Stream-like manipulation of iterables.
This is a template you can use for your next data engineering portfolio project.
Jayvee is a domain-specific language and runtime for automated processing of data pipelines
Service for bulk-loading data to databases with automatic schema management (Redshift, Snowflake, BigQuery, ClickHouse, Postgres, MySQL)
Watchmen Platform is a low code data platform for data pipeline, meta data management , analysis, and quality management
The goal of this project is to track the expenses of Uber Rides and Uber Eats through data Engineering processes using technologies such as Apache Airflow, AWS Redshift and Power BI.
Add a description, image, and links to the etl-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the etl-pipeline topic, visit your repo's landing page and select "manage topics."