|
1 |
| -# Submission Instructions |
2 |
| -## How to participate |
| 1 | +# Subtask 1 |
| 2 | + |
| 3 | +In subtask 1 the goal is to predict labels for each text in a dataset where the labels are derived from the original |
| 4 | +labels assigned by several human annotators. |
| 5 | + |
| 6 | +The human annotators assigned (according to the [annotation guidelines](guidelines.md) ) |
| 7 | +the strength of misogyny/sexism present in the given text via the following labels: |
| 8 | + |
| 9 | +* `0-Kein`: no sexism/misogyny present |
| 10 | +* `1-Gering`: mild sexism/misogyny |
| 11 | +* `2-Vorhanden`: sexism/misogyny present |
| 12 | +* `3-Stark`: strong sexism/misogyny |
| 13 | +* `4-Extrem`: extreme sexism/misogyny |
| 14 | + |
| 15 | +While the annotation guidelines define what kind of sexism/misogyny should get annotated, there has been made no attempt to |
| 16 | +give rules about how to decide on the strength. For this reason, if an annotator decided that sexism/misogyny is present in a text, |
| 17 | +the strength assigned is a matter of personal judgement. |
| 18 | + |
| 19 | +The labels to predict in subtask one reflect different strategies for how multiple labels from annotators can be use to derive a final |
| 20 | +target label: |
| 21 | + |
| 22 | +* `bin_maj`: predict `1` if a majority of annotators assigned a label other than `0-Kein`, predict `0` if a majority of annotators assigned a label |
| 23 | + `0-Kein`. If there was no majority, then both the label `1` and `0` will count as correct in the evaluation. |
| 24 | +* `bin_one`: predict `1` if at least one annotator assigned a label other than `0-Kein`, `0` otherwise |
| 25 | +* `bin_all`: predict `1` if all annotators assigned labels other than `0-Kein`, `0` otherwise |
| 26 | +* `multi_maj`: predict the majority label if there is one, if there is no majority label, any of the labels assigned is counted as a correct prediction for evaluation |
| 27 | +* `disagree_bin`: predict `1` if there is disagreement between annotators on `0-Kein` versus all other labels and `0` otherwise |
| 28 | + |
| 29 | + |
| 30 | +## Data |
3 | 31 |
|
4 | 32 | For the development phase of subtask 1, we provide all participants with the following data:
|
5 |
| -* the labeled training set containing 'id', 'text', and 'annotations' |
6 |
| -* the unlabeled dev set containing 'id' and 'annotations' |
| 33 | +* the labeled training set containing 'id', 'text', and 'annotations' (annotator ids and the label assigned by them) |
| 34 | +* the unlabeled dev set containing 'id', 'text' and 'annotators' (annotator ids) |
| 35 | + |
| 36 | +Both files are in JSONL format (one JSON-serialized object per line) where each object is a dictionary with the following |
| 37 | +fields: |
| 38 | + |
| 39 | +* `id`: a hash that identifies the example |
| 40 | +* `text`: the text to classify. The text can contain arbitrary Unicode and new lines |
| 41 | +* `annotations` (only in the labeled dataset): an array of dictionaries which contain the following key/value pairs: |
| 42 | + * `user`: a string in the form "A003" which is an anonymized id for the annotator who assigned the label |
| 43 | + * `label`: the label assigned by the annotator |
| 44 | + * Note that the number of annotations and the specific annotators who assigned labels vary between examples |
| 45 | +* `annotators` (only in the unlabeled dataset): an array of annotator ids who labeled the example |
| 46 | + |
| 47 | +You can [download](download.md) the labeled and unlabeled data for the development phase and for the competition phase. |
| 48 | + |
| 49 | + |
| 50 | +## Submission |
7 | 51 |
|
8 |
| -You can download the data [add-link](link-tbd) |
| 52 | +Your submission must be a file in TSV (tab separated values) format which contains the following columns in any order: |
9 | 53 |
|
10 |
| -**note**: do we provide example submissions? |
| 54 | +* `id`: the id of the example in the unlabeled dataset for which the predictions are submitted |
| 55 | +* `bin_maj`: prediction of `0` or `1` |
| 56 | +* `bin_one`: prediction of `0` or `1` |
| 57 | +* `bin_all`: prediction of `0` or `1` |
| 58 | +* `multi_maj`: prediction of one of `0-Kein`, `1-Gering`, `2-Vorhanden`, `3-Stark`, `4-Extrem` |
| 59 | +* `disagree_bin`: predictiction of `1` or `0` |
11 | 60 |
|
12 |
| -**Goal** of subtask 1 is to solve 4 binary classification tasks on the data and to predict the majority label. |
| 61 | +Note that the way how you derive those labels is up to you (as long as the rules for the closed or open tracks are followed): |
13 | 62 |
|
14 |
| -For each submission: |
15 |
| -* save your predictions to a separate csv file. The file needs to contain the following columns: |
16 |
| - * 'id': the unique ID of each text, as specified in the dev/test data |
17 |
| - * 'bin_maj': predict 1 if a majority of annotators assigned non-0 (scores 1 - 4), predict 0 if a majority of annotators assigned 0 |
18 |
| - * 'bin_one': predict 1 if at least one annotator assigned non-0, 0 otherwise |
19 |
| - * 'bin_all': predict 1 if all annotators assigned non-0 |
20 |
| - * 'multi_maj': predict the majority label if there is one |
21 |
| - * 'disagree_bin': predict 1 if there is disagreement between annotators on 0 vs non-0 |
22 |
| -* compress this csv file into a zip file. |
23 |
| -* under My Submissions, fill out the submission form and submit the zip file. |
| 63 | +* you can train several models or a single model to get the predictions |
| 64 | +* you can derive the mode-specific training set in any way from the labeled training data |
| 65 | +* you can use the information of which annotator assigned the label or ignore that |
24 | 66 |
|
25 |
| -**note**: do we want the data in .csv format? |
| 67 | +To submit your predictions to the competition: |
26 | 68 |
|
27 |
| -For the Development Phase, multiple submissions are allowed and they serve the purpose of developing the model. |
| 69 | +* the file MUST have the file name extension `.tsv` |
| 70 | +* the TSV file must get compressed into a ZIP file with extension `.zip` |
| 71 | +* the ZIP file should then get uploaded as a submission to the correct competition. |
| 72 | +* !! Please make sure you submit to the competition that corresponds to the correct subtask (1 or 2) and correct track (Open or Closed)! |
| 73 | +* under "My Submissions" make sure to fill out the form and: |
| 74 | + * enter the name of your team which has been registered for the competition |
| 75 | + * give a name to your method |
| 76 | + * confirm that you have checked that you are indeed submitting to the correct competition for the subtask and track desired |
28 | 77 |
|
29 |
| -For the Test Phase, participants may only submit two times, to allow for a mistake in the first submission. Please note that only the latest valid submission determines the final task ranking. |
| 78 | +## Phases |
30 | 79 |
|
31 |
| -**note**: for EDOS, they restricted the submission in the test phase to 2. Do we want that as well? |
| 80 | +* For the Development Phase, multiple submissions are allowed and they serve the purpose of developing and improving the model(s). |
| 81 | +* For the Test Phase, participants may only submit a limited number of times. Please note that only the latest valid submission determines the final task ranking. |
32 | 82 |
|
33 | 83 | ## Evaluation
|
34 | 84 |
|
|
0 commit comments