# clone the project
git clone [email protected]:celsofranssa/xCoFormer.git
# change directory to project folder
cd xCoFormer/
# Create a new virtual environment by choosing a Python interpreter
# and making a ./venv directory to hold it:
virtualenv -p python3 ./venv
# activate the virtual environment using a shell-specific command:
source ./venv/bin/activate
# install dependecies
pip install -r requirements.txt
# setting python path
export PYTHONPATH=$PATHONPATH:<path-to-project-dir>/xCoFormer/
# (if you need) to exit virtualenv later:
deactivate
Download the datasets from kaggle:
kaggle datasets download celsofranssa/xcoformer-datasets -p resource/dataset/ --unzip
After downloading the datasets from it should be placed inside the resources/datasets/
folder as shown below:
xCoFormer/
|-- resources
| |-- datasets
| | |-- java_v01
| | | |-- test.jsonl
| | | |-- train.jsonl
| | | `-- val.jsonl
| | `-- python_v01
| | |-- test.jsonl
| | |-- train.jsonl
| | `-- val.jsonl
The following bash command fits the BERT encoder over Java dataset using batch_size=128 and a single epoch.
python main.py tasks=[fit] model=BERT data=JAVA data.batch_size=128 trainer.max_epochs=1
If all goes well the following output should be produced:
GPU available: True, used: True
[2020-12-31 13:44:42,967][lightning][INFO] - GPU available: True, used: True
TPU available: None, using: 0 TPU cores
[2020-12-31 13:44:42,967][lightning][INFO] - TPU available: None, using: 0 TPU cores
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[2020-12-31 13:44:42,967][lightning][INFO] - LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
| Name | Type | Params
------------------------------------------------------------
0 | desc_encoder | BERTEncoder | 109 M
1 | code_encoder | BERTEncoder | 109 M
2 | loss_fn | NPairLoss | 0
3 | mrr | MRRMetric | 0
------------------------------------------------------------
91.0 M Trainable params
Epoch 0: 100%|███████████████████████████████████████████████████████| 5199/5199 [13:06<00:00, 6.61it/s, loss=5.57, v_num=1, val_mrr=0.041, val_loss=5.54]
Testing: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████| 288/288 [00:17<00:00, 16.83it/s]
--------------------------------------------------------------------------------
DATALOADER:0 TEST RESULTS
{'m_test_mrr': tensor(0.0410),
'm_val_mrr': tensor(0.0410),
'test_mrr': tensor(0.0410),
'val_loss': tensor(5.5390, device='cuda:0'),
'val_mrr': tensor(0.0410)}
--------------------------------------------------------------------------------