Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
-
Updated
Dec 27, 2024 - Java
Empowering Data Intelligence with Distributed SQL for Sharding, Scalability, and Security Across All Databases.
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
The leader in Next-Generation Customer Data Infrastructure
Flink CDC is a streaming data integration tool
Privacy and Security focused Segment-alternative, in Golang and React
A list of useful resources to learn Data Engineering from scratch
Memphis.dev is a highly scalable and effortless data streaming platform
ingestr is a CLI tool to copy data between any databases with a single command seamlessly.
An open-source data logging library for machine learning models and data pipelines. 📚 Provides visibility into data quality & model performance over time. 🛡️ Supports privacy-preserving data collection, ensuring safety & robustness. 📈
The dbt-native data observability solution for data & analytics engineers. Monitor your data pipelines in minutes. Available as self-hosted or cloud service with premium features.
A lightweight stream processing library for Go
CLI task management & automation tool
BitSail is a distributed high-performance data integration engine which supports batch, streaming and incremental scenarios. BitSail is widely used to synchronize hundreds of trillions of data every day.
🔥🔥🔥 Open Source Alternative to Hightouch, Census, and RudderStack - Reverse ETL & Data Activation
Source code accompanying book: Data Science on the Google Cloud Platform, Valliappa Lakshmanan, O'Reilly 2017
Example end to end data engineering project.
Smarter data pipelines for audio.
A compute framework for building Search, RAG, Recommendations and Analytics over complex structured & unstructured data.
Pythonic tool for orchestrating machine-learning/high performance/quantum-computing workflows in heterogeneous compute environments.
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
Add a description, image, and links to the data-pipeline topic page so that developers can more easily learn about it.
To associate your repository with the data-pipeline topic, visit your repo's landing page and select "manage topics."