Test Clutch is a system for tracking and analyzing automated regression test results over multiple continuous integration services. It was born from the need in the curl project to make sense of the more than 300,000 tests run every day over its six CI services. It does not run tests itself, but rather collects statistics from one or more test runners.
Test Clutch was born out of the curl project which was spreading its CI load over the free tier of five different CI providers (plus user-submitted builds run on even more build farms), so detailed results could not be viewed in a single location. The summary provided by GitHub for PRs showed only failed builds without much detail, and it ignored entirely automated builds run by users on the daily tarballs provided by the project. There was also no test summary of the master branch more than a simple binary failed/succeeded in status badges. Test Clutch brings into one place information about millions of test runs and attempts to give the user a useful view of them.
Test Clutch is built around a database of test results. Test results enter the database in the ingestion phase where they can then be summarized or queried. The database holds information down to the level of which test numbers were run in a given run and their success or failure. Unlimited metadata can be attached to each run to help in later analysis, ranging from the git commit of the test source and the CI service being used, down to the hostname and OS version of the test runner. sqlite3 is used as the database, which means the program does not need an external database running.
A periodic job polls the relevant CI services for new test results that are stored in the database. The polling interval should be set depending on how fresh you want the data to be versus how much load you want or are allowed to put on the CI system.
Some useful data may not be available from the CI service. Test data augmentation can add data to the test runs in the database to fill in the gaps. For example, a git hash might only be available in its short form; an augmentation job can be run to expand the hash to its full value by looking it up in a relevant git repository, making queries by hash behave consistently.
Once ingestion is complete, analysis of the runs can be performed. The main analysis currently available is a summary of the most recent test runs showing their overall success or failure, along with information on which tests have been flaky recently. Values collected from tests can also be exported in OpenMetrics format to be analyzed and graphed by tools such as Prometheus and Grafana.
A tool is also available to query the database to search for matching tests or jobs manually.
Test Clutch can periodically look at test results generated by PRs and add comments to them regarding any test failures they caused. Specifically, it mentions when failed tests are known to be flaky (based on their history), whether failures of the same test occurred in multiple test jobs, and in each case provides a link to the test logs for quick access.
The latest source code can be obtained from https://github.com/dfandrich/testclutch/
The code is written entirely in Python. Build and install the latest code from Github with:
python -m pip install https://github.com/dfandrich/testclutch/archive/refs/heads/master.tar.gz
The regression test suite can be run with the command:
pytest
or
python -m unittest
Test Clutch does not require git for its basic functions, but those directly
involved in querying git (like tcgitcommitinfo) require the git command-line
tool be installed (version 1.8.3 and newer are known to work).
Create a file ~/.config/testclutchrc in the same format as configdef.py
(you can use the file examples/testclutchrc as a template).
Most items have sane defaults, but check_repo must be set to a URL for the
repository of your source code. This is used as an identifier in the database
and can be used by augmentation jobs. Configuration options can be overridden
on the command line by using --set NAME=VALUE. Note that values must be
specified as in the configuration file, which, in particular, means that
strings must be surrounded by quotation marks which will normally need to be
escaped to get them through the shell.
You must configure the ingestion jobs for the CI services you are using. Test Clutch supports these CI services:
- Appveyor
- Azure
- Circle CI
- Cirrus
- GitHub Actions
- curl autobuild (specific to the curl project)
Some of these may require credentials to access the log files. These are
configured on the command-line according to each service's needs using the
--account and --authfile command-line options. Currently, only GitHub
Actions needs an authentication token, stored in the file in the --authfile
argument. Create one by going to https://github.com/settings/tokens and
choosing fine-grained tokens. Create a new token with these characteristics:
- Only select repositories (selecting the source repository or repositories)
- "Metadata" repository permissions (read)
- "Pull requests" repository permissions (read and write)
Alternately, a classic token works as well, with the public_repo scope
enabled. Copy the token contents from the web browser and store it in a file
in a protected location on your local machine and with permissions that do not
allow other users to access it. Tokens set up this way will work with public
repositories.
Test Clutch has built-in support for these test log formats:
- curl runtests (specific to the curl project)
- Gnu automake
- Pytest (short, verbose, color, and with xdist and pytest-astropy-header)
- Python unittest (with -v -b flags)
The logs parsed are those that are displayed as the tests are being run. If your test runner uses a different format for reporting on test results, you will need to create a parser to ingest them.
The most interesting report currently available is obtained by running
tcanalysissum. Another program tcanalyzepr is also available to download
test results relating to a GitHub PR to summarize the run tests and indicate
which failing tests have recently been flaky.
The file examples/daily-update is an example that you can use to create a
custom periodic update script for your own use case. It can be installed as a
cron job to generate up-to-date test reports.
Test Clutch can export test times in OpenMetrics format which can be imported
into Prometheus for analysis and graphing. The file
prometheus/data/user/testclutch.html is a static page containing links to
some useful graphs that can be hosted on a Prometheus instance by providing
the --web.user-assets=user option (when prometheus/data/ is the Prometheus
base directory). That page will be available at
http://localhost:9090/user/testclutch.html (or whatever origin your instance
is using).
The default Prometheus retention time is quite short relative to the rate of
Test Clutch data generated, which is generally most interesting when viewed
over weeks rather than hours or even days. To avoid data from being
automatically deleted too early, supply an option like
--storage.tsdb.retention.time=26w to keep half a year of data.
You can import about one month's worth of data into Prometheus by running:
tcquerytests --since 720 --format openmetrics >metrics.om
promtool tsdb create-blocks-from openmetrics metrics.om
These are the main entry points to Test Clutch. Most of them access --help to
show some information on using them.
Analyze ingested data looking for patterns of failure in the tests and generate a report in text or HTML formats.
Analyze logs from a GitHub PR for patterns of failure in the tests and generate a report in text or HTML formats. Or, check if the CI jobs associated with a PR have run to completion. Note that --only-failed-prs may not be reliable if use with commands other than --ci-status.
Adds git commit hashes & summary to autobuilds built from curl's daily tarballs.
OBSOLETE since curl started storing the commit hash in daily tarballs on 2024-08-07.
Adds full-length git commit hashes & summary to runs that only have a short hash. This is currently only applicable to curl autobuild logs.
Finds test job runs in which a particular test failed or succeeded.
Reads in information about git commits into the database.
Reads test results from log files from CI services and ingests them into the database.
Create reports summarizing the metadata and statistics about recent test logs.
These are additional entry points that can be useful for debugging.
Show information about a specific ingested job based on specific
metadata. This can output in text or OpenMetrics formats.
A single query term may be given, which restricts output to those runs that
match on the given metadata. A query term is of the format field<op>value
where <op> is one of = <> != <= >= < > % or !%. The
operators have their expected meanings, except % and !% mean the SQL LIKE
and NOT LIKE operators, respectively.
Perform low-level manipulations of the database such as deleting a run or checking a commit chain.
Parse a single log file on disk from stdin and view the parsed data on the stdout using the configured log parsers.
Read the specified curl daily tarball and dump its metadata.
Dump the raw form of the analyzepr PR status cache.
Modules that parse different test log formats.
Modules that ingest test logs from various CI services.
Modules that augment test metadata with additional data potentially from elsewhere.
Modules for regression testing the rest of the code.
Some example configuration and script files are supplied for reference.
You can create plug-ins for ingesting new log formats by writing a Python
module similar to the existing ingestion ones and by referencing it by name in
the log_parsers configuration entry. Try to use metadata in the same format
as existing parsers, if possible and relevant, to make future analysis tasks
simpler. See metadata for a list of standard mandatory and some
optional metadata types.
Test Clutch is in rapid development and no guarantees of compatibility with future or previous versions is currently being made. Contact the developers if you would like to propose an API be stabilized. The first one is likely to be the test log parsing API.
Having a database filled with test run information opens up a range of possibilities for its use. Here are some:
-
determining which tests are flaky to prioritize work to fix them
-
keeping track of the current success/failure status of tests on the master branch
-
notifying developers who are responsible for submitting a change that caused tests to start failing
-
notifying PR developers through GitHub PR comments about failing tests that are likely to be flaky and not the fault of the PR
-
finding commonalities between failing tests, e.g. tests that fail only on ARM processors, or tests that fail only on a Linux 6.1 kernel, or tests that fail only with clang 11.
-
identifying when a commit causes test coverage to suddenly drop
-
identifying which CI builds have the most/least test coverage
-
determine which builds are running a specific test number
-
finding builds that match specific build criteria (e.g. compiler version, OS, curl features, curl dependencies) and looking at their test results
-
seeing if specific tests started running faster (or slower) after a specific commit
-
alerting somebody on specific conditions, such as if no tests have been run in 2 days, or the overall test success rate drops below 95%
Copyright (C) 2023–2025 Daniel Fandrich [email protected]
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see https://www.gnu.org/licenses/.