You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Test Framework for SQL on Hadoop technologies. Currently supports Apache Drill, a schema-free SQL query engine for Hadoop, NoSQL and cloud storage.
The framework is built for regression, integration & sanity testing. Includes test coverage (with baselines) for core Drill functionality, and supported features. And are used by the Apache Drill community for pre-commit regression and part of the release criteria.
The test framework requires a distributed file system such as HDFS or MapR-FS to be configured. Some of the tests can also be run against a local file system. By default, it's configured to run against MapR-FS. You can change the default behavior by modifying conf/core-site.xml. Refer to conf/core-site.xml.example for settings.
To run all tests, Hive and HBase needs to be installed and running. To exclude Hive and HBase tests, please refer to the example in the Execute Tests section.
The test framework should be run on a Drill cluster node. Refer to Drill documentation for details on how to setup Drill. It can also be run on a client node with additional configuration required.
Cluster information are set in the conf/drillTestConfig.properties file. This is the main configuration file for the framework. It needs to be modified with local cluster info before compile the framework and run tests.
Review tests:
Structure of Test Framework
drill-test-framework
|_ framework
|_ resources
|_ Functional (default location for test suites)
|_ Advanced (test suites requiring large datasets)
|_ Datasources (datasets and scripts)
|_ bin
|_ build_framework (script used to compile the framework)
|_ run_tests (script used to execute tests)
Adding Tests
In the Functional / Advanced directory, create a directory for a new test suite (or cd to an existing test suite).
In the test suite directory, add testcases, expected results, and test definition file(s). Optionally include test suite sub directories to organize tests.
Pairs of testcase (ex: query1.sql) and expected result files (ex: query1.e_tsv) are co-located and share the same name.
You could generate expected result files using Postgres or any such database.
In the [Datasources][framework/resources/Datasources] directory, create corresponding datasource directories and copy over any required scripts and datasets required by the tests.
Refer to README for downloading datasets required for the Advanced tests.
Add testcases
A testcase consists a query file and a baseline file containing the expected result. The query file contains a list of SQL statements. Only one query statement that proceeded with a "--@test" line (can't contain ";") is verified against the baseline. If no query meets that requirement, the middle query is verified. Here is an example:
-- Some setup queries
setup query one;
...
--@test The following query will be verified.
select * from table;
-- Some teardown queries
teardown query one;
...
Structure of test definiton files.
A test definition file is a JSON file that defines one or a set of tests within a directory. The framework scans for files with .json extension in the specified location(s) and executes all test(s) defined.
{
"testId": "...", // Unique identification string for test suite. Ex: window_function_tests
"type": "group", // Group of tests
"negative": false, // Tests are positive or negative (expected success or expected failures)
"description": "...", // Description for test suites
"submit-type": "jdbc", // Currently the support types: jdbc, odbc, script
"script": "Functional/script_location/script.sh paras" // Specify the test script to execute
"queryType": "sql",
"timeout": null,
"categories": [
"...", // Labels for categorizing test suites. Ex: smoke
"..."
],
"dependencies": [
"...", // Additional dependencies in setup. Ex: hive, hbase, maprdb
"..."
],
"matrices": [
{
"query-file": `".*.sql"`, // File name pattern of query files (REGEX). The example will match any filename with ".sql" extension
"schema": "dfs.tmp", // Drill storage plugin.workspace, or just the storage plugin name in case of hive, hbase
"output-format": "tsv", // Delimiter to seperate columns in expected result files
"expected-file": `".*.e_tsv"`, // File name pattern of expected result files (REGEX)
"username": "user",
"password": "pass",
"verification-type": [ // Supported type "in-memory", "regex"
"in-memory",
"..."
]
}
],
"datasources": [
{
"mode": "cp", // Use "cp" to copy datasets
"src": "``",
"dest": "``"
},
{
"mode": "gen", // Use "gen" to execute scripts
"src": "``",
"dest": ""
},
...
]
}
Build test framework
In the root directory of you repository, issue
bin/build_framework -Pdownload
to build the project and also download any dependent datasets configured in pom.xml. If you already downloaded the dependent datasets, you can simply issue
bin/build_framework
Execute tests
In the root directory of your repository, execute the following command to run tests:
bin/run_tests -s `Functional/aggregates,Functional/joins` -g `smoke,regression` -x `hbase` -t `180` -n `2` -d
-s suites (required)
Here `Functional/aggregates,Functional/joins` are directories inside `framework/resources/Functional`. All test suites
and sub-suites within a directory are included
-g groups (required)
Here `smoke, regression` are categories of tests to execute
-t timeout (optional)
Here `180` seconds is the max time for a query to execute
-d data generation (required on first run and upon pulling in new changes. Optional otherwise)
Here it turns on data copy and data generation
-n concurrency (optional)
Here `2` queries can execute concurrently
-x exclude dependencies (optional)
Here any `hbase` test suites within the specified directory are excluded
-h help (optional)
Use this option to provide the usage of the command, which includes additional options
Special handling for Drill version in regex based baseline verification
The Drill version in a regex based baseline file can be substituted with "{DRILL_VERSION}".
At runtime, it will be replaced with actual Drill version and verified against the running version.
Special handling for order-by tests
The Test Framework is able to verify the order in which rows are returned for
some SQL statements that have an order-by clause. If the columns in the order-by
clause are also projected (present) in the output, then the Test Framework may
be able to check the order of the rows that are returned. Here are some
requirements for a query to be validated by the Test Framework:
All columns/fields in the order-by clause must appear in the projection list
Expressions cannot be used in the order-by clause. Things like
"order by column.field[2]". The [2] indicates an expression which is the
third element in the field array.
Referencing a field within a json string in a column is more complicated.
Most cases are supported. Some cases may not work.
If a query references more than one table, then use aliases for each column
in the projection list, and reference these aliases in the order-by clause.
Using aliases is a good practice in general when verifying an order-by clause.
The order-by clause cannot be followed by another SQL operation except for
limit. If the order-by clause is followed by an offset or collate, for
example, it might not work.
Here are some queries that can be validated:
select id, gbyi from complex.json order by id limit 10;
This query can be validated because the results are ordered by the "id" column,
and the "id" column is projected in the output. The Test Framework can
examine the output and verify that the rows are in order.
select * from
(select d.uid uid, flatten(d.events) evnts from data.json d order by d.uid) s
order by s.evnts.event_time, s.evnts.campaign_id;
This query can be validated because the results are ordered by the data in the
"evnts" column, and the "evnts" column is projected in the output. The Test
Framework can parse the JSON string in the "evnts" column and examine the
event_time and campaign_id values.
These queries cannot be validated:
select t.gbyt, t.id, t.ooa[0].in zeroin, t.ooa[1].fl.f1 flf1, t.ooa[1].fl.f2 flf2, t.ooa[1].in onein, t.ooa[2].a.aa.aaa, t.ooa[2].b.bb.bbb, t.ooa[2].c.cc.ccc from complex.json t where t.ooa[2].b.bb.bbb is not null order by t.ooa[2].c.cc.ccc limit 10;
This query cannot be validated because the order-by has an expression,
"t.ooa[2].c.cc.ccc". The Test Framework cannot evaluate the array reference
"ooa[2]".
select id from complex.json order by gbyi limit 10;