diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml new file mode 100644 index 0000000..d64b642 --- /dev/null +++ b/.github/workflows/docs.yml @@ -0,0 +1,36 @@ +name: Documentation + +on: + push: + branches: + - main + tags: "v**" + paths: + - 'docs/**' + - '.github/workflows/docs.yml' + workflow_dispatch: + +jobs: + docs: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + with: + token: ${{ secrets.SDQ_TOKEN }} + - uses: actions/checkout@v4 + with: + repository: ${{ github.repository }}.wiki + path: wiki + token: ${{ secrets.SDQ_TOKEN }} + + - name: Remove contents in Wiki + working-directory: wiki + run: ls -A1 | grep -v '.git' | xargs rm -r + + - name: Copy Wiki from Docs folder + run: cp -r ./docs/. ./wiki + + - name: Deploy 🚀 + uses: stefanzweifel/git-auto-commit-action@v5 + with: + repository: wiki diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml new file mode 100644 index 0000000..c1a853c --- /dev/null +++ b/.github/workflows/publish.yml @@ -0,0 +1,27 @@ +name: Deploy to GitHub +on: + workflow_dispatch: + release: + types: [created, published] +jobs: + publish-release-artifact: + runs-on: ubuntu-latest + steps: + - uses: actions/checkout@v4 + - uses: joshlong/java-version-export-github-action@v28 + id: jve + + - name: Java without Cache + uses: actions/setup-java@v4 + with: + java-version: ${{ steps.jve.outputs.java_major_version }} + distribution: 'temurin' + + - name: Build Metrics + run: mvn -U -B clean package + + - name: Attach CLI to Release on GitHub + uses: softprops/action-gh-release@v2 + with: + files: cli/target/metrics-cli.jar + fail_on_unmatched_files: true diff --git a/README.md b/README.md index c4ebc06..504f626 100644 --- a/README.md +++ b/README.md @@ -1,9 +1,27 @@ -# Metrics -This repository contains tools to calculate several metrics. +# ArDoCo: Metrics Calculator +Welcome to the **ArDoCo Metrics Calculator** project! This tool provides functionality to calculate and aggregate **classification** and **rank metrics** for various machine learning and ranking tasks. -## Metrics Module +The [Wiki](https://github.com/ArDoCo/Metrics/wiki) contains all the necessary information to use the **ArDoCo Metrics Calculator** via multiple interfaces, including a library, REST API, and command-line interface (CLI). -## CLI +## Quickstart -## REST +To use this project as a Maven dependency, you need to include the following dependency in your `pom.xml` file: +```xml + + io.github.ardoco + metrics + ${revision} + +``` + +To use the CLI run the following command: + +```shell +java -jar metrics-cli.jar -h +``` + +To use the REST API via Docker, start the server with the following command: +```shell +docker run -it -p 8080:8080 ghcr.io/ardoco/metrics:latest +``` diff --git a/cli/pom.xml b/cli/pom.xml index 309aed4..8d87452 100644 --- a/cli/pom.xml +++ b/cli/pom.xml @@ -61,9 +61,10 @@ edu.kit.kastel.mcse.ardoco.metrics.cli.AppKt - - jar-with-dependencies - + + src/assembly/src.xml + + metrics diff --git a/cli/src/assembly/src.xml b/cli/src/assembly/src.xml new file mode 100644 index 0000000..b5fde4f --- /dev/null +++ b/cli/src/assembly/src.xml @@ -0,0 +1,17 @@ + + cli + + jar + + false + + + / + true + true + runtime + + + diff --git a/docs/Aggregation-of-Metrics.md b/docs/Aggregation-of-Metrics.md new file mode 100644 index 0000000..e3cf0a1 --- /dev/null +++ b/docs/Aggregation-of-Metrics.md @@ -0,0 +1,43 @@ +In addition to calculating individual metrics for classification and ranking tasks, the system supports the **aggregation** of results across multiple classifications or rank-based results. Aggregation methods allow users to compute overall metrics that represent the combined performance of several tasks. + +## Aggregation Types + +The following **Aggregation Types** are supported for both classification and rank metrics: + +1. **Macro Average**: This type of aggregation computes the average of the metrics for each class or query, giving equal weight to each. + - **Use Case**: Useful when all classes or queries are equally important, regardless of how many instances belong to each class. + +2. **Micro Average**: This method aggregates by counting the total true positives, false positives, and false negatives across all classes or queries, then computes the metrics globally. + - **Use Case**: Useful when classes or queries have an uneven number of instances, and you want to prioritize overall accuracy over individual class performance. + +3. **Weighted Average**: In this method, the average is computed with weights, typically proportional to the number of instances in each class or query. + - **Use Case**: Useful when certain classes or queries are more important and should contribute more to the overall metrics. + +## Aggregation for Classification Metrics + +The **AggregatedClassificationResult** class aggregates results from multiple classification tasks. It combines metrics like precision, recall, and F1-score across multiple classification results and calculates an overall score using one of the aggregation methods mentioned above. + +Key Metrics Aggregated: +- **Precision** +- **Recall** +- **F1-Score** +- **Accuracy (if available)** +- **Specificity (if available)** +- **Phi Coefficient (if available)** +- **Phi Coefficient Max (if available)** +- **Phi Over Phi Max (if available)** + +**Example:** +If you perform multiple classification tasks and want a single precision or recall score, the **macro average** would treat each classification equally, while the **weighted average** would account for the number of instances in each task. + +## Aggregation for Rank Metrics + +The **AggregatedRankMetricsResult** class aggregates results from multiple ranking tasks. It computes an overall **Mean Average Precision (MAP)**, **LAG**, and **AUC** by combining the results of each individual rank task. + +Key Metrics Aggregated: +- **Mean Average Precision (MAP)** +- **LAG** +- **AUC (if available)** + +**Example:** +For search or ranking tasks, you might aggregate the **MAP** scores of multiple queries to get a single performance measure for the ranking system across all queries. \ No newline at end of file diff --git a/docs/Classification-Metrics.md b/docs/Classification-Metrics.md new file mode 100644 index 0000000..2d45efe --- /dev/null +++ b/docs/Classification-Metrics.md @@ -0,0 +1,53 @@ +The classification metrics calculator is responsible for computing various classification performance metrics based on input classifications and ground truth data. + +## Input + +1. **Classification**: A set of classified elements. +2. **Ground Truth**: A set representing the actual classification labels for comparison. +3. **String Provider Function (optional)**: A function that converts classification and ground truth elements into string representations for comparison purposes. +4. **Confusion Matrix Sum (optional)**: The sum of the confusion matrix values (true positives, false positives, etc.). Some metrics may not be calculated if this is not provided. + +:warning: Classification result entries have to match entries in the ground truth (equals) + +## Supported Metrics + +The system calculates a variety of standard classification metrics: + +1. **Precision**: Measures the accuracy of the positive predictions. + + $$\text{Precision} = \frac{TP}{TP + FP}$$ + + Where: + - \( TP \) is the number of true positives. + - \( FP \) is the number of false positives. + +2. **Recall**: Also known as sensitivity, recall measures the ability to find all positive instances. + + $$\text{Recall} = \frac{TP}{TP + FN}$$ + + Where: + - \( FN \) is the number of false negatives. + +3. **F1-Score**: A harmonic mean of precision and recall, providing a single score that balances both concerns. + + $$F1 = 2 \times \frac{\text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}}$$ + +4. **Accuracy (optional)**: Measures the proportion of correctly predicted instances (if true negatives are provided). + + $$\text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN}$$ + +5. **Specificity (optional)**: Also called true negative rate, it measures the proportion of actual negatives that are correctly identified. + + $$\text{Specificity} = \frac{TN}{TN + FP}$$ + +6. **Phi Coefficient (optional)**: A measure of the degree of association between two binary variables. + + $$\Phi = \frac{TP \times TN - FP \times FN}{\sqrt{(TP + FP)(TP + FN)(TN + FP)(TN + FN)}}$$ + +7. **Phi Coefficient Max (optional)**: The maximum possible value for the phi coefficient. + +8. **Phi Over Phi Max (optional)**: The ratio of the phi coefficient to its maximum possible value. + +Each result includes a human-readable format that logs the computed metrics for ease of debugging and verification. + + diff --git a/docs/Home.md b/docs/Home.md new file mode 100644 index 0000000..56436d5 --- /dev/null +++ b/docs/Home.md @@ -0,0 +1,51 @@ +Welcome to the **ArDoCo Metrics Calculator** project! This tool provides functionality to calculate and aggregate **classification** and **rank metrics** for various machine learning and ranking tasks. + +This Wiki contains all the necessary information to use the **ArDoCo Metrics Calculator** via multiple interfaces, including a library, REST API, and command-line interface (CLI). + + +## 1. Classification Metrics + +This section provides detailed information about how to calculate **classification metrics** such as precision, recall, F1-score, and more. The classification metrics are essential for evaluating the performance of classification models by comparing the predicted results with the ground truth. + +[Read more about Classification Metrics](Classification-Metrics) + + + +## 2. Rank Metrics + +The rank metrics module helps you calculate metrics for ranked results, such as **Mean Average Precision (MAP)**, **LAG**, and **AUC**. These metrics are useful for evaluating ranking systems, search engines, or recommendation systems. + +[Read more about Rank Metrics](Rank-Metrics) + + + +## 3. Aggregation of Metrics + +Aggregation allows you to compute an overall metric from multiple classification or ranking tasks. This can be useful when you want to combine results from several tasks to get a single evaluation score. + +[Read more about Aggregation of Metrics](Aggregation-of-Metrics) + + +## 4. Usage + +### 4.1 Usage via Library + +The **ArDoCo Metrics Calculator** can be integrated into your project as a library. This section provides instructions for adding the project as a Maven dependency and examples of how to calculate metrics programmatically. + +[Read more about Usage via Library](Usage-Via-Library) + + + +### 4.2 Usage via REST API + +The project offers a REST API for calculating metrics. You can send HTTP requests to the API to compute both classification and rank metrics, as well as aggregate results across tasks. Swagger documentation is provided for easy testing and interaction. + +[Read more about Usage via REST API](Usage-Via-REST-API) + + + +### 4.3 Usage via CLI + +For users who prefer using a command-line interface, the project offers CLI commands for calculating and aggregating metrics. This section provides detailed instructions and examples on how to use the CLI for different tasks. + +[Read more about Usage via CLI](Usage-Via-CLI) diff --git a/docs/Rank-Metrics.md b/docs/Rank-Metrics.md new file mode 100644 index 0000000..fb78228 --- /dev/null +++ b/docs/Rank-Metrics.md @@ -0,0 +1,58 @@ +The rank metrics calculator computes performance metrics for systems that provide ranked results, such as search engines or recommendation systems. These metrics are based on the comparison between the provided ranked results and the ground truth data. + +## Input + +1. **Ranked Results**: A list of sorted lists, where each list represents the ranked results for one query or item (with the most relevant items first). +2. **Ground Truth**: A set of items representing the correct or ideal results for the given queries or items. +3. **String Provider Function**: A function that converts the ranked results and ground truth elements into string representations for comparison purposes. +4. **Relevance-Based Input (optional)**: Contains relevance scores associated with each ranked result. This input is used for relevance-based calculations, allowing the ranking system to incorporate degrees of relevance. + +## Supported Metrics + +The rank metrics calculator computes the following key metrics: + +1. **Mean Average Precision (MAP)**: This metric computes the average precision for each query and then averages those precision values over all queries. It provides a single score that summarizes the quality of the ranked results. + + $$\text{MAP} = \frac{1}{N} \sum_{i=1}^{N} \text{AveragePrecision}(i)$$ + + Where: + - $N$ is the number of queries. + - $\text{AveragePrecision}(i)$ is the average of the precision scores at each relevant document for query $i$. It is calculated by considering only the positions where relevant items are retrieved and averaging the precision at those points. + + $$\text{AveragePrecision}(i) = \frac{\sum_{r=1}^{|retrieved_i|} (precision_i(r)\times relevant_i(r))}{|relevantLinks_i|}$$ + + Where: + - $|retrieved|$ is the number of retrieved links for a query + - $r$ is the rank in the produced list + - $precision(r)$ is the *precision* of the list if truncated after rank $r$ + - $relevant(r)$ is a binary function that determines whether the link at rank $r$ is valid (1) or not (0) + - $|relevantLinks|$ is the total number of links that are relevant for this query according to the gold standard + +2. **LAG**: LAG measures the distance (lag) between the position of relevant items in the ranked results and their ideal positions (i.e., as close to the top as possible). It helps assess how well the system ranks relevant items near the top. + + $$\text{LAG} = \frac{1}{N} \sum_{i=1}^{N} \text{Lag}(i)$$ + + Where: + - $\text{Lag}(i)$ is the average lag for query $i$. + + Lag measures how many incorrect links are retrieved above each correct link. For example, if the relevant item should ideally be at position 1 but is ranked at position 3, the lag for that item is 2. The lag is averaged over all relevant documents for query $i$ to compute $\text{Lag}(i)$. + +3. **ROC (Receiver Operating Characteristic) Curve (optional)** + + The **ROC curve** is a graphical representation of a classification model’s performance across different decision thresholds. It plots: + + - **True Positive Rate (TPR)**, or **Recall**, on the **y-axis**: $\text{TPR} = \frac{TP}{TP + FN}$ + where $TP$ is the number of true positives, and $ FN $ is the number of false negatives. + + - **False Positive Rate (FPR)** on the **x-axis**: $\text{FPR} = \frac{FP}{FP + TN}$ + where $FP$ is the number of false positives, and $ TN $ is the number of true negatives. + + Each point on the ROC curve corresponds to a different threshold used by the classifier to distinguish between positive and negative predictions. By adjusting the threshold, the TPR and FPR values change, and the ROC curve shows how well the classifier separates the positive from the negative class. + + +3. **Area Under Curve (AUC) of the Receiver Operating Characteristic (ROC) (optional)**: AUC measures the ability of the system to discriminate between relevant and non-relevant items. The AUC value ranges from 0 to 1, where 1 indicates perfect discrimination. + + $$\text{AUC} = \int_0^1 \text{TPR}(FPR)\ dFPR$$ + + Where $TPR$ is the true positive rate and $ FPR $ is the false positive rate. + diff --git a/docs/Usage-Via-CLI.md b/docs/Usage-Via-CLI.md new file mode 100644 index 0000000..91973d3 --- /dev/null +++ b/docs/Usage-Via-CLI.md @@ -0,0 +1,102 @@ +The metrics calculator provides a command-line interface (CLI) that allows users to calculate both classification and rank metrics, as well as aggregate the results from multiple inputs. + +## Running the CLI + +To run the CLI, use the following command: + +```bash +java -jar metrics-cli.jar [options] +``` + +Each command has specific options and input files required to perform the desired calculation. + +## Commands + +### 1. **Classification Metrics Command** + +This command calculates classification metrics for a single classification task. + +**Command:** +``` +classification +``` + +**Options:** +- `-c, --classification `: The file containing the classified items. +- `-g, --ground-truth `: The file containing the ground truth items. +- `--header`: (Optional) Indicates that the input files have a header. +- `-s, --sum `: (Optional) The sum of the confusion matrix. +- `-o, --output `: (Optional) The output file to store the results. + + +**Example Usage:** +```bash +java -jar metrics-cli.jar classification -c classified.txt -g ground_truth.txt --header -s 5 -o result.json +``` + +This command reads the classification and ground truth data, computes metrics like precision, recall, F1-score, and saves the result to `result.json` if specified. + +### 2. **Rank Metrics Command** + +This command calculates rank metrics such as Mean Average Precision (MAP), LAG, and AUC for ranking tasks. + +**Command:** +``` +rank +``` + +**Options:** +- `-r, --ranked-list-directory `: The directory containing ranked list result files for each query. +- `-g, --ground-truth `: The file containing the ground truth items. +- `--header`: (Optional) Indicates that the input files have a header. +- `-rrl, --ranked-relevance-list-Directory `: (Optional) The directory with ranked relevance score lists. +- `-b, --bigger-is-more-similar `: (Optional) Specifies if higher relevance scores indicate more relevance. +- `-o, --output `: (Optional) The output file to store the results. + + +**Example Usage:** +```bash +java -jar metrics-cli.jar rank -r ranked_results/ -g ground_truth.txt -rrl relevance_scores/ -b true --header -o result.json +``` + +This command computes the rank metrics based on the provided ranked results, ground truth, and optional relevance scores, and saves the output if specified. + +### 3. **Aggregation of Classification Metrics** + +This command aggregates multiple classification results into one, calculating an overall precision, recall, and F1-score. + +**Command:** +``` +aggCl +``` + +**Options:** +- `-d `: The directory with the result json files. +- `-o, --output `: (Optional) The output file to store the results. + +**Example Usage:** +```bash +java -jar metrics-cli.jar aggCl -d classifiedDir/ -o aggregated_result.json +``` + +This command aggregates classification results from multiple files and calculates the average performance. + +### 4. **Aggregation of Rank Metrics** + +This command aggregates multiple rank metrics, calculating the average performance across multiple ranking tasks. + +**Command:** +``` +aggRnk +``` + +**Options:** +- `-d `: The directory with the result json files. +- `-o, --output `: (Optional) The output file to store the results. + +**Example Usage:** +```bash +java -jar metrics-cli.jar aggRnk -d classifiedDir/ -o aggregated_result.json +``` + +This command aggregates rank metrics from multiple directories and computes an overall performance score. diff --git a/docs/Usage-Via-Library.md b/docs/Usage-Via-Library.md new file mode 100644 index 0000000..4efc8e1 --- /dev/null +++ b/docs/Usage-Via-Library.md @@ -0,0 +1,156 @@ +The project can be integrated into your own system as a library to calculate both classification and rank metrics. Follow the steps below to use the metrics calculator within your code. + +## 1. Adding the Dependency + +To use this project as a Maven dependency, you need to include the following dependency in your `pom.xml` file: + +```xml + + io.github.ardoco + metrics + ${revision} + +``` + +Make sure to replace `${revision}` with the appropriate version number of the library. You can find the version from the repository or Maven Central. + +### Optional: Snapshot Repository + +If you are using a snapshot version of the library (like `0.1.1-SNAPSHOT`), you will need to include the **snapshot repository** configuration in your `pom.xml` file. This enables Maven to fetch the latest snapshot build: + +```xml + + + mavenSnapshot + https://s01.oss.sonatype.org/content/repositories/snapshots + + true + + + +``` + +## 2. Importing the Metrics Calculator + +Once the library is included, you can import and use the **ClassificationMetricsCalculator** and **RankMetricsCalculator** in your project. + +### Example for Classification Metrics: + +```kotlin +import edu.kit.kastel.mcse.ardoco.metrics.ClassificationMetricsCalculator +import edu.kit.kastel.mcse.ardoco.metrics.result.SingleClassificationResult + +fun main() { + val classification = setOf("A", "B", "C") + val groundTruth = setOf("A", "C", "D") + + // Use the ClassificationMetricsCalculator to calculate metrics + val calculator = ClassificationMetricsCalculator.Instance + val result: SingleClassificationResult = calculator.calculateMetrics( + classification = classification, + groundTruth = groundTruth, + confusionMatrixSum = null + ) + + result.prettyPrint() // Logs precision, recall, F1 score, etc. +} +``` + +### Example for Rank Metrics: + +```kotlin +import edu.kit.kastel.mcse.ardoco.metrics.RankMetricsCalculator +import edu.kit.kastel.mcse.ardoco.metrics.result.SingleRankMetricsResult + +fun main() { + val rankedResults = listOf( + listOf("A", "B", "C"), // Ranked results for query 1 + listOf("B", "A", "D") // Ranked results for query 2 + ) + val groundTruth = setOf("A", "B") + + // Use the RankMetricsCalculator to calculate metrics + val calculator = RankMetricsCalculator.Instance + val result: SingleRankMetricsResult = calculator.calculateMetrics( + rankedResults = rankedResults, + groundTruth = groundTruth, + relevanceBasedInput = null + ) + + result.prettyPrint() // Logs MAP, LAG, AUC, etc. +} +``` + +## 3. Customizing the Calculations + +Both calculators (classification and rank metrics) provide customizable inputs like: +- **String Provider**: Allows you to specify how the elements in your classification or ranking are converted to strings. +- **Relevance-Based Input (optional for rank metrics)**: Allows you to input additional relevance scores if needed for calculating metrics like AUC. + +### Example for Using `RelevanceBasedInput` with Rank Metrics + +The **`RelevanceBasedInput`** class allows you to pass additional relevance scores for ranked results when calculating rank metrics like **AUC**. This relevance-based information gives more context to the ranking system, allowing it to factor in how relevant each item is. + +#### Code Example: + +```kotlin +import edu.kit.kastel.mcse.ardoco.metrics.RankMetricsCalculator +import edu.kit.kastel.mcse.ardoco.metrics.result.SingleRankMetricsResult +import edu.kit.kastel.mcse.ardoco.metrics.internal.RelevanceBasedInput + +fun main() { + // Example ranked results for two queries + val rankedResults = listOf( + listOf("A", "B", "C"), // Ranked results for query 1 + listOf("D", "A", "B") // Ranked results for query 2 + ) + + // Ground truth for relevance (the most relevant results) + val groundTruth = setOf("A", "B") + + // Relevance scores associated with the ranked results + val rankedRelevances = listOf( + listOf(0.9, 0.8, 0.4), // Relevance scores for query 1 + listOf(0.7, 0.6, 0.5) // Relevance scores for query 2 + ) + + // Creating the RelevanceBasedInput object + val relevanceInput = RelevanceBasedInput( + rankedRelevances = rankedRelevances, // Relevance scores for ranked results + doubleProvider = { it }, // Function to provide the relevance value (identity function in this case) + biggerIsMoreSimilar = true // Whether higher values mean more relevance + ) + + // Use the RankMetricsCalculator to calculate metrics + val calculator = RankMetricsCalculator.Instance + val result: SingleRankMetricsResult = calculator.calculateMetrics( + rankedResults = rankedResults, + groundTruth = groundTruth, + relevanceBasedInput = relevanceInput + ) + + // Print the calculated rank metrics (MAP, LAG, AUC, etc.) + result.prettyPrint() +} +``` + +#### Explanation: +1. **Ranked Results**: A list of lists where each list represents ranked items for a query. + - Query 1: $ ["A", "B", "C"] $ + - Query 2: $ ["D", "A", "B"] $ + +2. **Ground Truth**: The correct (most relevant) items, which are $ ["A", "B"] $. + +3. **Relevance Scores**: The relevance values for the ranked results: + - Query 1: $ [0.9, 0.8, 0.4] $ – Higher scores indicate more relevant items. + - Query 2: $ [0.7, 0.6, 0.5] $. + +4. **Relevance-Based Input**: This structure provides the calculator with the relevance scores and indicates that higher values represent more relevance (i.e., `biggerIsMoreSimilar = true`). + +#### Customization: +- You can modify the `doubleProvider` function to convert any complex structure (such as custom objects) into a numeric relevance score. +- The `biggerIsMoreSimilar` flag can be set to `false` if lower values indicate more relevance (e.g., in a ranking where 1st place is more relevant than 10th place). + +## 4. Aggregation of Results + +To aggregate multiple classification or ranking results, you can utilize the respective aggregation methods provided by the library. For more details, refer to the [Aggregation](Aggregation-of-Metrics) section. \ No newline at end of file diff --git a/docs/Usage-Via-REST-API.md b/docs/Usage-Via-REST-API.md new file mode 100644 index 0000000..49b25f5 --- /dev/null +++ b/docs/Usage-Via-REST-API.md @@ -0,0 +1,242 @@ +The metrics calculator provides a REST API that allows users to calculate classification and rank metrics by sending requests with their data. The API is built using Spring Boot and offers endpoints for both **classification** and **rank** metrics, as well as aggregation features. + +## Base URL + +By default, the API runs on port **8080**, and all endpoints are accessible under the base URL: + +``` +http://localhost:8080/api +``` + +## API Documentation via Swagger + +The REST API provides a **Swagger UI** that allows you to easily explore and test the API endpoints. Swagger generates interactive API documentation and can be accessed from a web browser. + +**Swagger URL:** +``` +http://localhost:8080/swagger-ui/index.html +``` + +Through Swagger, you can: +- View all available endpoints. +- See detailed descriptions of the request and response formats. +- Test API calls directly from the browser. + +## Endpoints + +### 1. Classification Metrics + +You can calculate classification metrics by sending data to the classification API. + +**Endpoint:** +``` +POST /classification-metrics +``` + +**Request Body Example:** + +```json +{ + "classification": [ + "string" + ], + "groundTruth": [ + "string" + ], + "confusionMatrixSum": 0 +} +``` + +**Response Example:** + +```json +{ + "truePositives": [ + "string" + ], + "falsePositives": [ + "string" + ], + "falseNegatives": [ + "string" + ], + "trueNegatives": 0, + "precision": 0, + "recall": 0, + "f1": 0, + "accuracy": 0, + "specificity": 0, + "phiCoefficient": 0, + "phiCoefficientMax": 0, + "phiOverPhiMax": 0 +} +``` + +#### Aggregation of Classification Metrics + +You can also aggregate multiple classification results into one by sending the following request: + +**Endpoint:** +``` +POST /classification-metrics/average +``` + +**Request Body Example:** + +```json +{ + "classificationRequests": [ + { + "classification": ["A", "B"], + "groundTruth": ["A", "C"] + }, + { + "classification": ["B", "C"], + "groundTruth": ["C", "D"] + } + ], + "weights": [2, 1] +} +``` + +**Response Example:** + +```json +{ + "classificationResults": [ + { + "type": "MACRO_AVERAGE", + "precision": 0, + "recall": 0, + "f1": 0, + "accuracy": 0, + "specificity": 0, + "phiCoefficient": 0, + "phiCoefficientMax": 0, + "phiOverPhiMax": 0, + "originalSingleClassificationResults": [ + { + "truePositives": [ + "string" + ], + "falsePositives": [ + "string" + ], + "falseNegatives": [ + "string" + ], + "trueNegatives": 0, + "precision": 0, + "recall": 0, + "f1": 0, + "accuracy": 0, + "specificity": 0, + "phiCoefficient": 0, + "phiCoefficientMax": 0, + "phiOverPhiMax": 0 + } + ], + "weights": [ + 2, 1 + ] + } + ] +} +``` + +### 2. Rank Metrics + +To calculate rank metrics, send your ranked results and ground truth to the rank metrics API. + +**Endpoint:** +``` +POST /rank-metrics +``` + +**Request Body Example:** + +```json +{ + "rankedResults": [ + ["A", "B", "C"], + ["B", "A", "D"] + ], + "groundTruth": ["A", "B"], + "rankedRelevances": [[0.9, 0.8, 0.4], [0.7, 0.6, 0.5]], + "biggerIsMoreSimilar": true +} +``` + +**Response Example:** + +```json +{ + "map": 0.85, + "lag": 13, + "auc": 0.92, + "groundTruthSize": 2 +} +``` + +#### Aggregation of Rank Metrics + +You can also aggregate multiple rank metrics using the following endpoint: + +**Endpoint:** +``` +POST /rank-metrics/average +``` + +**Request Body Example:** + +```json +{ + "rankMetricsRequests": [ + { + "rankedResults": [["A", "B", "C"], ["B", "A", "D"]], + "groundTruth": ["A", "B"], + "rankedRelevances": [[0.9, 0.8, 0.4], [0.7, 0.6, 0.5]], + "biggerIsMoreSimilar": true + }, + { + "rankedResults": [["D", "E", "F"], ["B", "A", "D"]], + "groundTruth": ["D", "E"], + "rankedRelevances": [[0.4, 0.8, 0.9], [0.5, 0.6, 0.7]], + "biggerIsMoreSimilar": false + } + ], + "weights": [2, 1] +} +``` + +**Response Example:** + +```json +{ + "rankResults": [ + { + "type": "WEIGHTED_AVERAGE", + "map": 0.8, + "lag": 12, + "auc": 0.48, + "originalRankResults": [ + { + "map": 0.85, + "lag": 13, + "auc": 0.92, + "groundTruthSize": 2 + }, + { + "map": 0.55, + "lag": 25, + "auc": 0.52, + "groundTruthSize": 2 + } + ], + "weights": [ + 2, 1 + ] + } + ] +} +``` \ No newline at end of file