☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
-
Updated
Nov 3, 2024 - Python
☄️ Python's nested data operator (and CLI), for all your declarative restructuring needs. Got data? Glom it! ☄️
🚚 Agile Data Preparation Workflows made easy with Pandas, Dask, cuDF, Dask-cuDF, Vaex and PySpark
Logical Replication extension for PostgreSQL 17, 16, 15, 14, 13, 12, 11, 10, 9.6, 9.5, 9.4 (Postgres), providing much faster replication than Slony, Bucardo or Londiste, as well as cross-version upgrades.
Scalable identity resolution, entity resolution, data mastering and deduplication using ML
A block-based API for NSValueTransformer, with a growing collection of useful examples.
Optimus is an easy-to-use, reliable, and performant workflow orchestrator for data transformation, data modeling, pipelines, and data quality management.
Advanced and Fast Data Transformation in R
Microsoft Program Synthesis using Examples SDK is a framework of technologies for the automatic generation of programs from input-output examples. This repo includes samples and sample data for the Microsoft Program Synthesis using Example SDK.
💄 Durable and asynchronous data imports for consuming data at scale and publishing testable SDKs.
Like awk but with SQL and table joins
Low-code Python library to safely use notebooks in production: schedule workflows, generate assets, trigger webhooks, send notifications, build pipelines, manage secrets (Cloud-only)
📄 Concise selector to extract JSON from HTML.
An Extensible Suite of High-Performance and Low-Dependency Packages for Statistical Computing and Data Manipulation in R
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
A simple Spark-powered ETL framework that just works 🍺
A curated list of Clojure resources for dealing with domain-specific languages.
Clojure Query: A Command-line Data Processor for JSON, YAML, EDN, XML and more
Data transformation and utility functions for R
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
🤖 An automated machine learning framework for audio, text, image, video, or .CSV files (50+ featurizers and 15+ model trainers). Python 3.6 required.
Add a description, image, and links to the data-transformation topic page so that developers can more easily learn about it.
To associate your repository with the data-transformation topic, visit your repo's landing page and select "manage topics."