MeSQuaL is a system for profiling and checking data quality before further tasks, such as data analytics and machine learning. MeSQuaL extends SQL for querying relational data with constraints on data quality and facilitates the verification of statistical tests.
The system includes: (1) a query interpreter for SQuaL, the SQL-extended language we propose for declaring and querying data with data quality checks and statistical tests; (2) an extensible library of user-defined functions for profiling the data and computing various data quality indicators; and (3) a user interface for declaring data quality constraints, profiling data, monitoring data quality with SQuaL queries, and visualizing the results via data quality dashboards. We showcase our system in action with various scenarios on real-world data sets and show its usability for monitoring data quality over time and checking the quality of data on-demand.
For more details
Please cite and refer to the demo presented at EDBT 2020 Conference:
- Ugo Comignani, Noël Novelli, Laure Berti-Équille: Data Quality Checking for Machine Learning with MeSQuaL. EDBT 2020: 591-594 Preprint
Demo Videos
- Contract type and contract instance declaration
- Querying data with data quality constraints and statistical tests
- Monitoring data quality
In the MeSQuaL-engine directory, to generate the parser, run:
mvn javacc:javacc
then, to produce a jar file, run:
mvn install
To avoid executing the unitary tests, instead of previous command run:
mvn install -DskipTests
At first, copy the querying interface plugin in the dedicated directory(/var/lib/grafana/plugins/ by default) and compile it. For example, on linux systems run:
cp -R ./MeSQuaL-visualization/MeSQuaL-query-grafana-panel/ /var/lib/grafana/plugins/
cd /var/lib/grafana/plugins/MeSQuaL-query-grafana-panel/
npm install
yarn build
Then, in Grafana use the "Import" function and paste the content of the json file grafana-dashboard-setting.json to import the MeSQuaL dashboard.
If Grafana cannot load the querying plugin, then run:
npx npm-force-resolutions
npm install
Launch MeSQuaL engine by running:
java -jar common-1.0-SNAPSHOT-jar-with-dependencies.jar
then MeSQuaL interface can be accessed using Grafana. For Example, for a local installation of Grafana, connect to:
http://localhost:3000
We provides few datasets in the 'demo' directory. These datasets corresponds to the one used in the demo paper presented at the EDBT2020 conference.
For MySQL, the csv files in ./data/datasets/ should be copied in the default directory /var/lib/mysql-files/. Then the createTablesAndLoadData.sql script can be used to generates the tables and imports the data.
- Maven - Dependency Management used for MeSQuaL engine
- Yarn - Dependency Management used for MeSQuaL plugins (if you want to modify the querying plugin, please see the dedicated README.md in the plugin directory)
- Ugo Comignani
- Noël Novelli
- Laure Berti-Equille
This project is licensed under the GNU General Public License Version 3 - see the LICENSE.md file for details