-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
updated readme & reqs, prep for public, fix code style
- Loading branch information
Showing
15 changed files
with
244 additions
and
284 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,88 +1,90 @@ | ||
# WIP: Project title | ||
# JarSift | ||
|
||
## Setup | ||
|
||
This project requires a functioning MariaDB database. Connection details for this database should be provided in a `config.properties` file, located at the root of the project. It's essential that an empty database exists prior to initiating the process (this can be achieved by running the database initialisation procedure). | ||
This project requires a functioning MariaDB database. Connection details for this database should be provided in | ||
a `config.properties` file, located at the root of the project. It's essential that an empty database exists prior to | ||
initiating the process (this can be achieved by running the database initialisation procedure). | ||
|
||
The `config.properties` file should be based on the `config.properties.example` template found in the project root. | ||
|
||
Similarly, rename `.env.example` file to `.env` and populate it with the necessary values. | ||
Similarly, rename `.env.example` file to `.env` and populate it with the respective values. | ||
|
||
Lastly, rename the `my-custom.cnf.example` file to `my-custom.cnf` and fill in the appropriate details. | ||
Lastly, rename the `my-custom.cnf.example` file to `my-custom.cnf` and fill in the appropriate details fitting your | ||
environment. | ||
|
||
## Execution | ||
|
||
There are two key processes in the execution of the project: Corpus Creation and Inference. | ||
|
||
|
||
### Corpus Creation | ||
|
||
Follow the steps below for the corpus creation: | ||
|
||
1. Run the command `docker compose up db` or `docker-compose up db` depending on your docker version. | ||
2. Wait for the internal database initialisation to complete. | ||
3. Once completed, you can terminate the comtainer. | ||
4. Proceed by running either `docker-compose up` or `docker compose up` depending on your docker version. | ||
|
||
Used to create the paths file which is used to seed the database: | ||
|
||
It's crucial to follow this sequence. Prematurely running `docker-compose up` may result in the application failing due to an unprepared database connection. | ||
```bash | ||
find /path/to/your/local/.m2/repo \( -name "*.jar" -fprint jar_files.txt \) -o \( -name "*.pom" -fprint pom_files.txt \) | ||
``` | ||
|
||
### Inference | ||
When executing the inference segment, ensure: | ||
After the paths files have been created, follow the steps below to seed the database: | ||
|
||
1. The database is operational. | ||
2. Appropriate connection credentials are set in `config.properties`. | ||
1. Run `docker compose up db`. | ||
2. Wait for the internal database initialisation to complete. | ||
3. Once completed, you can terminate the container. | ||
4. Fill in the `PATHS_FILE` environment variable in the `docker-compose.yml` file or the `.env` file with the path to | ||
the `jar_files.txt` file created earlier. | ||
5. Proceed by running `docker compose up`. | ||
|
||
Poor verification, execute the following command from the project root: | ||
```bash | ||
sh run_inference.sh <path_to_uber_jar> | ||
``` | ||
It's crucial to follow this sequence. Prematurely running `docker compose up` may result in the application failing due | ||
to an unprepared database connection. | ||
|
||
Used to create the paths file | ||
```bash | ||
find /home/dan/.m2/repository \( -name "*.jar" -fprint jar_files.txt \) -o \( -name "*.pom" -fprint pom_files.txt \) | ||
``` | ||
### Inference | ||
|
||
To execute the inference segment, you need to have a MongoDB instance running which you need to seed with the necessary | ||
data. The data can be found in the `data` directory. | ||
To seed the MongoDB database: | ||
|
||
```bash | ||
# Create the MongoDB container | ||
docker compose up mongodb | ||
|
||
# You may use the existing all.zip file, or retrieve the latest data by running the following command (ensure you have gsutil installed) | ||
gsutil cp gs://osv-vulnerabilities/Maven/all.zip . | ||
|
||
# preferably in a venv | ||
cd util | ||
pip install -r requirements.txt | ||
python inport.py all.zip extracted | ||
python import.py all.zip extracted | ||
``` | ||
|
||
To export the SQL file for usage in SQLite: | ||
When executing the inference segment, ensure: | ||
|
||
1. The corpus database is operational and seeded with the necessary data. | ||
2. The MongoDB instance is operational and accessible and has been seeded with the necessary data. | ||
3. Appropriate connection credentials are set in `config.properties`. | ||
|
||
For verification, execute the following command from the project root: | ||
|
||
```bash | ||
mysqldump \ | ||
--host 127.0.0.1 \ | ||
--user=root --password \ | ||
--skip-create-options \ | ||
--compatible=ansi \ | ||
--skip-extended-insert \ | ||
--compact \ | ||
--single-transaction \ | ||
--no-create-db \ | ||
--no-create-info \ | ||
--hex-blob \ | ||
--skip-quote-names corpus \ | ||
| grep -a "^INSERT INTO" | grep -a -v "__diesel_schema_migrations" \ | ||
| sed 's#\\"#"#gm' \ | ||
| sed -sE "s#,0x([^,]*)#,X'\L\1'#gm" \ | ||
> mysql-to-sqlite.sql | ||
sh run_inference.sh <path_to_jar> | ||
``` | ||
|
||
To import the SQL file into SQLite: | ||
## Evaluation | ||
For the evaluation segment, you must ensure that the corpus database is operational and seeded with the necessary data. | ||
|
||
To generate the evaluation data, execute the following command from the project root: | ||
|
||
```bash | ||
sh run_generator.sh <jars per config> <max dependencies per jar> | ||
``` | ||
|
||
This will generate the Uber JARs and their respective metadata. This will also run the evaluation process and output the | ||
results to the `evaluation` directory. | ||
|
||
If you have already generated the evaluation data and wish to re-run the evaluation process, execute the following | ||
command from the project root: | ||
|
||
```bash | ||
sqlite3 corpus.db | ||
> CREATE TABLE IF NOT EXISTS libraries (id INTEGER PRIMARY KEY AUTOINCREMENT, group_id TEXT NOT NULL, artifact_id TEXT NOT NULL, version TEXT NOT NULL, jar_hash INTEGER NOT NULL, jar_crc INTEGER NOT NULL, is_uber_jar INTEGER NOT NULL, disk_size INTEGER NOT NULL, total_class_files INTEGER NOT NULL, unique_signatures INTEGER NOT NULL); | ||
> CREATE TABLE IF NOT EXISTS signatures (id INTEGER PRIMARY KEY AUTOINCREMENT, library_id INTEGER NOT NULL, class_hash TEXT NOT NULL, class_crc INTEGER NOT NULL); | ||
> PRAGMA synchronous = OFF; | ||
> PRAGMA journal_mode = MEMORY; | ||
> PRAGMA auto_vacuum=OFF; | ||
> PRAGMA index_journal=OFF; | ||
> PRAGMA temp_store=MEMORY; | ||
> PRAGMA cache_siz=-256000; | ||
sh run_evaluation.sh <evaluation data directory> | ||
``` |
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.