Skip to content

The Soft Cosine Measure system developed for the ARQMath-3 shared task evaluation of math information retrieval systems

License

Notifications You must be signed in to change notification settings

Witiko/scm-at-arqmath3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Soft Cosine Measure at ARQMath-3

This repository contains our math information retrieval (MIR) system for the ARQMath3 competition that is based on the soft cosine measure. The repository also contains the paper that describes our system.

Research goals

  1. Compare the soft vector space model against sparse information retrieval baselines.
  2. Compare performance of text, text + LaTeX, and text + Tangent-L as math representations
  3. Compare performance of non-positional word2vec and positional word2vec embeddings
  4. Compare performance of word2vec embeddings and decontextualized roberta-base embeddings
  5. Compare performance of decontextualized embeddings of roberta-base and tuned roberta-base
  6. Compare performance of interpolated and joint SCM models for text and math

Jupyter notebooks

  1. Prepare dataset
  2. Train tokenizer
  3. Tune roberta-base model
  4. Train word2vec models
  5. Produce decontextualized word embeddings
  6. Produce dictionaries
  7. Produce term similarity matrices
  8. Produce ARQMath runs
  9. Optimize soft vector space similarity matrices

Code pearls

Artefacts

Future work

Citing

Text

Vít Novotný and Michal Štefánik. “Combining Sparse and Dense Information Retrieval. Soft Vector Space Model and MathBERTa at ARQMath-3 Task 1 (Answer Retrieval)”. In: Proceedings of the Working Notes of CLEF 2022. Ed. by Guglielmo Faggioli, Nicola Ferro, Allan Hanbury, and Martin Potthast. CEUR-WS, 2022, pp. 104–118. URL: http://ceur-ws.org/Vol-3180/paper-06.pdf (visited on 08/12/2022).

Bib(La)TeX

@inproceedings{novotny2022combining,
  booktitle = {Proceedings of the Working Notes of {CLEF} 2022},
  editor = {Faggioli, Guglielmo and Ferro, Nicola and Hanbury, Allan and Potthast, Martin},
  issn = {1613-0073},
  title = {Combining Sparse and Dense Information Retrieval},
  subtitle = {Soft Vector Space Model and MathBERTa at ARQMath-3 Task 1 (Answer Retrieval)},
  author = {Novotný, Vít and Štefánik, Michal},
  publisher = {{CEUR-WS}},
  year = {2022},
  pages = {104-118},
  numpages = {15},
  url = {http://ceur-ws.org/Vol-3180/paper-06.pdf},
  urldate = {2022-08-12},
}

About

The Soft Cosine Measure system developed for the ARQMath-3 shared task evaluation of math information retrieval systems

Resources

License

Stars

Watchers

Forks

Packages

No packages published