The conference management tool CMT, hosted by Microsoft Research, is a popular tool to manage submissions, the reviewing process, and camrea-ready-copy preparation for scientific conferences. CMT provides various CSV and XLS export capabilities about papers, reviewers, authors, etc.
The CMT Statistics Tool is a Python application and PostgreSQL database for importing such CMT data, deriving and plotting various statistics and running utility queries. By harmonizing CMT exports into a common schema, it enables deep analysis of conference submissions. Common use cases, such as viewing submissions by date, country, affiliation, or finding differences in acceptance rates over time are included (see example images below). The easiest way to get started using the CMT Statistics Tool for your conference is by forking this repository and going through the Setup below (getting started).
This repository is based on initial work by Anna and Magda Balazinska for PVLDB volume 13 and VLDB 2020. It was extended and refined at HPI for PVLDB volume 14 and VLDB 2021. Its current form is intended to be as general as possible, but as each conference is different, slight adaptations will be necessary.
- Fork and clone this repository
- Setup your environment
- Export the required data from CMT
- Customize your insert logic and table schema
- Import the required data
- Run the statistics or utilities you are interested in
The project is structured into tables, insert, statistics and utility.
- The
tables
files are an ORM definition of the database tables. - The
insert
files contain the logic of importing the CMT exports. - The
statistics
files derive statistics as tables and plots from the data. - The
utility
files contain other helpful queries.
This project uses Python version 3.9.1
.
We recommend using pyenv
to manage your Python versions.
pyenv install 3.9.1
cd to-this-repo
pyenv local 3.9.1
exec "$SHELL"
python --version
# Python 3.9.1
python -m pip install --upgrade pip
For managing the Python environment, we use poetry. Install it as follows:
$ curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/install-poetry.py | python -
Initialize your Poetry environment and install the required python packages:
$ poetry install
Optionally install the pre-commit hooks:
$ poetry run pre-commit install
Additionally, we use Black as a formatter (nb-black for notebooks), Flake8 as a linter, and Mypy for static typing.
Please note that, depending on whether you use Jupyter Notebooks or Jupyter lab, nb-black requires either %load_ext nb_black
or %load_ext lab_black
.
For the PostgreSQL instance, you're free to connect any instance you already have in the tables/__init__.py
connection string.
If you wish to setup PostgreSQL from scratch, we recommend a Docker setup.
Please export the following Excel data from CMT:
people
: A single file containing all people participatingpapers
: A single file containing all research tracks and revisionsreviews
A single file containing all reviews on all tracks and revisionsmetareviews
A single file containing all metareviews on all tracks and revisions (if applicable)seniormetareviews
A single file containing all seniormetareviews on all tracks and revisions (if applicable)mapping
A single file containing special mappings from original submissions to revisions (if applicable)
The schemata of these exports are described below.
Note that CMT exports .xls
files.
Please convert them to .xlsx
beforehand, for example by using Excel's "Save As ..." function.
If your export columns are named differently than the schema, you must change the insert logic. If you have additional columns or do not have some columns in your export, you must change the insert logic as well as the table schema. For each of the following files, a script dealing with the insert is in the insert directory. Currently, the schema of the required data is as follows:
- People: TSV file (we identify persons by their First Name, Middle Initial, Last Name, and E-mail)
First Name
: strMiddle Initial (optional)
: strLast Name
: strE-mail
: strOrganization
: strCountry
: strGoogle Scholar
: strURL
: strSemantic Scholar URL
: strDBLP URL
: strDomain Conflicts
: str
- Papers: XLSX file with multiple sheets. Sheet names correspond to track names, revision tracks have the Suffix "Revision".
Paper ID
: int (primary key)Created
: strLast Modified
: strPaper Title
: strAbstract
: strPrimary Contact Author Name
: strPrimary Contact Author Email
: strAuthors
: strAuthor Names
: strAuthor Emails
: strTrack Name
: strPrimary Subject Area
: strSecondary Subject Areas
: strConflicts
: intDomains
: strAssigned
: intCompleted
: int% Completed
: strBids
: intDiscussion
: strStatus
: strRequested For Author Feedback
: strAuthor Feedback Submitted?
: strRequested For Camera Ready
: strCamera Ready Submitted?
: strRequested For Presentation
: strFiles
: strNumber of Files
: intSupplementary Files
: strNumber of Supplementary Files
: intReviewers
: strReviewer Emails
: strMetaReviewers
: strMetaReviewer Emails
: strSeniorMetaReviewers
: strSeniorMetaReviewerEmails
: strQ1 (PVLDB does not allow papers previously rejected from PVLDB to be resubmitted within 12 months of the original submission date.)
: strQ3 (Conflict)
: strQ4 (Special category)
: strQ7 (Authors)
: strQ8 (Availability and Reproducibility)
: str
- Reviews: XLSX file with multiple sheets. Sheet names correspond to track names, revision tracks have the Suffix "Revision".
Paper ID
: intPaper Title
: strReviewer Name
: strReviewer Email
: strQ1 (Overall Rating)
: strQ1 (Overall Rating - Value)
: intQ2 (Relevant for PVLDB)
: strQ3 (Are there specific revisions that could raise your overall rating?)
: strQ4 (Flavor of Regular Research Paper. Please indicate which flavor or flavors best describe the paper.)
: strQ5 (Summary of the paper (what is being proposed and in what context) and a brief justification of your overall recommendation. One solid paragraph.)
: strQ6 (Three (or more) strong points about the paper. Please be precise and explicit; clearly explain the value and nature of the contribution.)
: strQ7 (Three (or more) weak points about the paper. Please clearly indicate whether the paper has any mistakes, missing related work, or results that cannot be considered a contribution; write it so that the authors can understand what is seen as negative.)
: strQ8 (Novelty. Please give a high novelty ranking to papers on new topics, opening new fields, or proposing truly new ideas; assign medium ratings to delta papers and papers on well-known topics but still with some valuable contribution.)
: strQ9 (Significance)
: strQ10 (Technical Depth and Quality of Content)
: strQ11 (Experiments)
: strQ12 (Presentation)
: strQ13 (Detailed Evaluation (Contribution, Pros/Cons, Errors); please number each point and please provide as constructive feedback as possible.)
: strQ14 (Reproducibility. If the authors have provided supplemental material (data, code, etc.), is the information likely to be sufficient to understand and to reproduce the experiments? Otherwise, do the authors provide sufficient technical details in the paper to support reproducibility? Note that we do not expect actual reproducibility experiments, but rather a verification that the material is reasonable in scope and content.)
: strQ15 (Revision. If revision is required, list specific required revisions you seek from the authors. Please number each point.)
: strQ16 (Rate your confidence in this review.)
: strQ16 (Rate your confidence in this review. - Value)
: intQ17 (Confidential comments for the PC Chairs. Please add any information that may help us reach a decision.)
: strQ18 (Name and affiliation of external expert (!) reviewer (if applicable).)
: strQ19 (I understand that I am allowed to discuss a paper submission with a trainee for the purpose of teaching them how to review papers. I understand that (a) I am responsible to ensure that there is no COI according to the rules published at PVLDB.org between the trainee and any of the authors of the paper. (b) I have informed the trainee about the confidentiality of the content of the paper. (c) I am solely responsible for the final review. [If the trainee contributed significantly to the paper review, please list them above as external reviewer].)
: str
- Reviews with sheet name suffix "Revision":
Paper ID
: intPaper Title
: strReviewer Name
: strReviewer Email
: strQ1 (Final and Overall Recommendation)
: strQ1 (Final and Overall Recommendation - Value)
: intQ3 (Did the authors satisfactorily address the revision requirements identified in the meta-review of the original submission?)
: strQ3 (Did the authors satisfactorily address the revision requirements identified in the meta-review of the original submission? - Value)
: intQ5 (Justify your answer to the above question by briefly addressing key revision items.)
: strQ6 (Additional comments to the authors on the revised version of the paper)
: strQ18 (Confidential Comments for the PC Chairs. Please add any information that may help us reach a decision.)
: str
- Metareviews: XLSX file with multiple sheets. Sheet names correspond to track names, revision tracks have the Suffix "Revision".
Paper ID
: intPaper Title
: strMeta-Reviewer Name
: strMeta-Reviewer Email
: strQ1 (Overall Rating)
: strQ1 (Overall Rating - Value)
: intQ2 (Summary Comments)
: strQ3 (Revision Items)
: str
- Metareviews with sheet name suffix "Revision":
Paper ID
: intPaper Title
: strMeta-Reviewer Name
: strMeta-Reviewer Email
: strQ1 (Overall Rating)
: strQ1 (Overall Rating - Value)
: intQ2 (Detailed Comments)
: str
- Seniormetareviews: XLSX file with multiple sheets. Sheet names correspond to track names, revision tracks have the Suffix "Revision".
- currently not used
- Mapping: XLSX file
Revision ID
: strOriginalSubmission ID
: strRevision Title
: str
If any of these can be null, none, n/a, or any other special value, consider replacing them with an empty string in the insert logic scripts (see function fillna_strs
).
Note that some values are very specific to a conference's workflow, in particular the submission and revision status.
If you do not support major and minor revisions, please list the possibilities in the respective table schema.
The main entrypoint for building the tables and importing the data is the main.py
file.
Running it will drop all tables, re-create them, and insert all data.
There, you can define the names of the files containing the exported data.
Your database connection is configured in the tables/__init__.py
file.
Its default of postgres:root@localhost/cmt_statistics_tool
is intended only for testing purposes - please change.
The statistics are run by running the file containing them. The following statistics are available:
- Reviewers and ratings
- Paper status
- Other