Skip to content

Commit d33883c

Browse files
committed
WIP
1 parent fafa166 commit d33883c

File tree

5 files changed

+107
-34
lines changed

5 files changed

+107
-34
lines changed

site/announcement.md

-4
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,2 @@
1-
* Text to use for announcing the shared task by email or in social media.
2-
* may contain a copy of [overview](overview.md) text
3-
4-
## Text
51

62

site/closed-track.md

+15-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,15 @@
1-
In the closed tracks, participants agree to use only the annotated data provided within this task to develop their model. No (i) additional data labelled for sexism or misogyn or (ii) additional models trained on data labelled for sexism or misogyny are allowed. Participants having made at least one submission in a closed track during the Test Phase will be invited to submit a paper for the Shared Task at KONVENS 2024 describing their system. If participants also made at least one submission for an open track, they can also include a comparison of their approaches in the paper.
1+
# Closed Track Competition
2+
3+
There is a _Closed Track_ competition for each of the two subtasks. Please note the following:
4+
5+
* The closed track competitions are the main competitions: in order to submit a paper that describes your approach, you have to submit to the closed track of
6+
one or both of subtask 1 and subtask 2.
7+
* If you have submitted to the closed track of one or both subtasks, you can also include information about your open track approach in your paper submission.
8+
* IMPORTANT: In the closed tracks, participants agree to use **only** the annotated data provided within this task to develop their model. More specifically:
9+
* the use of additional data labelled for sexism or misogyn is not allowed
10+
* the use of pretrained models or embeddings trained on data labelled for sexism or misogyny is not allowed
11+
* the use of other models, ontologies, knowledge bases or similar that contains specific knowledge about sexism / misogyny is not allowed
12+
* pretrained models like BERT or embeddings are allowed as long as they have not been specifically pre-trained or fine-tuned on sexism/misogyny-specific data other than the data shared for this competition
13+
* If in doubt if your approach is compatible with the closed track requirements, please ask in the competition forum or send an email to the
14+
organizers. If you send an email to the organizers you can include information which you might not want to share in the forum which the organizers will
15+
keep confidential.

site/index.md

+10-7
Original file line numberDiff line numberDiff line change
@@ -43,24 +43,27 @@ The shared task is divided into two subtasks:
4343

4444
## Closed and open tracks
4545

46-
Each of the [subtask 1](subtask1.md) and [subtask 2](subtask2.md) competitions
46+
Each of the [subtask 1](subtask1.html) and [subtask 2](subtask2.md) competitions
4747
are organized into two different tracks:
4848

4949
* [Closed Track](closed-track.md): in this track, models can only be trained with the provided training set. Models are limited as to what kind of data for pretraining is allowed. Only the closed track counts towards the competition of the shared task and a closed track submission is required for the submission of a paper. See the linked document for details.
50-
* [Open Track](open-track.md): in this track, anything goes really: you can use language models, use your own training data (but you have to share it with the community) or use other interesting approaches. The open track does NOT count towards the competition ranking but has been added to allow for the exploration of interesting strategies which may be hard to reproduce.
50+
* [Open Track](open-track.md): in this track, anything goes really: you can use language models, use your own training data (but you have to share it with the community) or use other interesting approaches. The open track does NOT count towards the competition ranking but has its own leader board and has been added to allow for the exploration of interesting strategies which may be hard to reproduce.
5151

52-
53-
5452
## Timeline
5553

56-
* **Development phase**: April 14 - May 17, 2024
57-
* **Testing phase**: May 18 - June 12, 2024
58-
* **Evaluation phase**: June 13 - June 25, 2024
54+
* **Development phase**: April 14 - June 12, 2024
55+
* During the development phase, a labeled training set and an unlabeled development set are made available. You can upload the labels for the development set
56+
to the competition site and will see the ranking of your submission on the leaderboard.
57+
* **Competition phase**: June 13 - June 25, 2024
58+
* During the competition phase, the labeled training and development set is released and an unlabeled test set is made available. You can upload the labels for
59+
the test set to the competition site and your most last submission will be the one that will be used for ranking. During that phase, the leaderboard is not shown.
60+
The final leaderboard/ranking is shown after the end of the competition phase.
5961
* **Paper submission due**: July 1, 2024
6062
* **Camera ready due**: July 20, 2024
6163
* **Shared Task @KONVENS**: 9 September, 2024
6264

6365
## Organizers
66+
6467
The task is organized by the [**Austrian Research Institute for Artificial Intelligence (OFAI)**](https://ofai.at). The organizing team are:
6568

6669
* [Brigitte Krenn](https://www.ofai.at/~brigitte.krenn/) (brigitte.krenn (AT) ofai.at)

site/open-track.md

+11-1
Original file line numberDiff line numberDiff line change
@@ -1 +1,11 @@
1-
In the open tracks, participants are encouraged to use additional data or models trained on labelled data. These additional labelled data, embeddings or models need to be open source and provided by the participants upon request and if possible via the submission page. Participants submitting in open tracks are only invited to submit a paper for the Shared Task at KONVENS 2024 describing their system, if they also made a submission in a closed track during the Test Phase and compare their approaches in the paper. Due to reproducibility issues, e.g. when including generative LLM such as GPT 3.5, we do not accept papers who solely present approaches for the open tracks.
1+
# Closed Track Competition
2+
3+
4+
There is an _Olosed Track_ competition for each of the two subtasks. Please note the following:
5+
6+
* In the open tracks, participants are encouraged to use whatever approach they prefer
7+
* Additional labeld data or models or embeddings trained on labelled data are allowed.
8+
* HOWEVER Additional labelled data, embeddings or models must be publically available as open source or with a creative-commons license
9+
* IMPORTANT: Participants submitting in open tracks are only invited to submit a paper for the Shared Task at KONVENS 2024 describing their system, if they also made a submission in a closed track during the Competition Phase.
10+
* Due to reproducibility issues, e.g. when including results from commercial or closed-source models we do not accept papers which solely present approaches for the open tracks.
11+
* We do look forward however to find out how the results in the open tracks will compare to the closed track results.

site/subtask1.md

+71-21
Original file line numberDiff line numberDiff line change
@@ -1,34 +1,84 @@
1-
# Submission Instructions
2-
## How to participate
1+
# Subtask 1
2+
3+
In subtask 1 the goal is to predict labels for each text in a dataset where the labels are derived from the original
4+
labels assigned by several human annotators.
5+
6+
The human annotators assigned (according to the [annotation guidelines](guidelines.md) )
7+
the strength of misogyny/sexism present in the given text via the following labels:
8+
9+
* `0-Kein`: no sexism/misogyny present
10+
* `1-Gering`: mild sexism/misogyny
11+
* `2-Vorhanden`: sexism/misogyny present
12+
* `3-Stark`: strong sexism/misogyny
13+
* `4-Extrem`: extreme sexism/misogyny
14+
15+
While the annotation guidelines define what kind of sexism/misogyny should get annotated, there has been made no attempt to
16+
give rules about how to decide on the strength. For this reason, if an annotator decided that sexism/misogyny is present in a text,
17+
the strength assigned is a matter of personal judgement.
18+
19+
The labels to predict in subtask one reflect different strategies for how multiple labels from annotators can be use to derive a final
20+
target label:
21+
22+
* `bin_maj`: predict `1` if a majority of annotators assigned a label other than `0-Kein`, predict `0` if a majority of annotators assigned a label
23+
`0-Kein`. If there was no majority, then both the label `1` and `0` will count as correct in the evaluation.
24+
* `bin_one`: predict `1` if at least one annotator assigned a label other than `0-Kein`, `0` otherwise
25+
* `bin_all`: predict `1` if all annotators assigned labels other than `0-Kein`, `0` otherwise
26+
* `multi_maj`: predict the majority label if there is one, if there is no majority label, any of the labels assigned is counted as a correct prediction for evaluation
27+
* `disagree_bin`: predict `1` if there is disagreement between annotators on `0-Kein` versus all other labels and `0` otherwise
28+
29+
30+
## Data
331

432
For the development phase of subtask 1, we provide all participants with the following data:
5-
* the labeled training set containing 'id', 'text', and 'annotations'
6-
* the unlabeled dev set containing 'id' and 'annotations'
33+
* the labeled training set containing 'id', 'text', and 'annotations' (annotator ids and the label assigned by them)
34+
* the unlabeled dev set containing 'id', 'text' and 'annotators' (annotator ids)
35+
36+
Both files are in JSONL format (one JSON-serialized object per line) where each object is a dictionary with the following
37+
fields:
38+
39+
* `id`: a hash that identifies the example
40+
* `text`: the text to classify. The text can contain arbitrary Unicode and new lines
41+
* `annotations` (only in the labeled dataset): an array of dictionaries which contain the following key/value pairs:
42+
* `user`: a string in the form "A003" which is an anonymized id for the annotator who assigned the label
43+
* `label`: the label assigned by the annotator
44+
* Note that the number of annotations and the specific annotators who assigned labels vary between examples
45+
* `annotators` (only in the unlabeled dataset): an array of annotator ids who labeled the example
46+
47+
You can [download](download.md) the labeled and unlabeled data for the development phase and for the competition phase.
48+
49+
50+
## Submission
751

8-
You can download the data [add-link](link-tbd)
52+
Your submission must be a file in TSV (tab separated values) format which contains the following columns in any order:
953

10-
**note**: do we provide example submissions?
54+
* `id`: the id of the example in the unlabeled dataset for which the predictions are submitted
55+
* `bin_maj`: prediction of `0` or `1`
56+
* `bin_one`: prediction of `0` or `1`
57+
* `bin_all`: prediction of `0` or `1`
58+
* `multi_maj`: prediction of one of `0-Kein`, `1-Gering`, `2-Vorhanden`, `3-Stark`, `4-Extrem`
59+
* `disagree_bin`: predictiction of `1` or `0`
1160

12-
**Goal** of subtask 1 is to solve 4 binary classification tasks on the data and to predict the majority label.
61+
Note that the way how you derive those labels is up to you (as long as the rules for the closed or open tracks are followed):
1362

14-
For each submission:
15-
* save your predictions to a separate csv file. The file needs to contain the following columns:
16-
* 'id': the unique ID of each text, as specified in the dev/test data
17-
* 'bin_maj': predict 1 if a majority of annotators assigned non-0 (scores 1 - 4), predict 0 if a majority of annotators assigned 0
18-
* 'bin_one': predict 1 if at least one annotator assigned non-0, 0 otherwise
19-
* 'bin_all': predict 1 if all annotators assigned non-0
20-
* 'multi_maj': predict the majority label if there is one
21-
* 'disagree_bin': predict 1 if there is disagreement between annotators on 0 vs non-0
22-
* compress this csv file into a zip file.
23-
* under My Submissions, fill out the submission form and submit the zip file.
63+
* you can train several models or a single model to get the predictions
64+
* you can derive the mode-specific training set in any way from the labeled training data
65+
* you can use the information of which annotator assigned the label or ignore that
2466

25-
**note**: do we want the data in .csv format?
67+
To submit your predictions to the competition:
2668

27-
For the Development Phase, multiple submissions are allowed and they serve the purpose of developing the model.
69+
* the file MUST have the file name extension `.tsv`
70+
* the TSV file must get compressed into a ZIP file with extension `.zip`
71+
* the ZIP file should then get uploaded as a submission to the correct competition.
72+
* !! Please make sure you submit to the competition that corresponds to the correct subtask (1 or 2) and correct track (Open or Closed)!
73+
* under "My Submissions" make sure to fill out the form and:
74+
* enter the name of your team which has been registered for the competition
75+
* give a name to your method
76+
* confirm that you have checked that you are indeed submitting to the correct competition for the subtask and track desired
2877

29-
For the Test Phase, participants may only submit two times, to allow for a mistake in the first submission. Please note that only the latest valid submission determines the final task ranking.
78+
## Phases
3079

31-
**note**: for EDOS, they restricted the submission in the test phase to 2. Do we want that as well?
80+
* For the Development Phase, multiple submissions are allowed and they serve the purpose of developing and improving the model(s).
81+
* For the Test Phase, participants may only submit a limited number of times. Please note that only the latest valid submission determines the final task ranking.
3282

3383
## Evaluation
3484

0 commit comments

Comments
 (0)