Skip to content

Commit

Permalink
merge fix
Browse files Browse the repository at this point in the history
  • Loading branch information
bw4sz committed Sep 13, 2024
2 parents e9483c0 + 2708783 commit 70572cc
Show file tree
Hide file tree
Showing 3 changed files with 97 additions and 15 deletions.
8 changes: 4 additions & 4 deletions docs/datasets.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ Citation: Amirkolaee, Hamed Amini, Miaojing Shi, and Mark Mulligan. "TreeFormer:

Location: London, England

<img src="public/TreeFormer.jpg" alt="sample_image" style="width:300px; height:auto;">
![sample_image](public/TreeFormer.jpg)

### Ventura et al. 2022

Expand All @@ -44,7 +44,7 @@ J. Ventura, C. Pawlak, M. Honsberger, C. Gonsalves, J. Rice, N.L.R. Love, S. Han

Location: Southern California, United States

<img src="public/Ventura.png" alt="sample_image" style="width:300px; height:auto;">
![sample_image](public/Ventura.png)

## Polygons

Expand Down Expand Up @@ -74,7 +74,7 @@ Indiana, United States

### Jansen et al. 2022

<img src="public/Jansen.png" alt="sample_image" style="width:300px; height:auto;">
![sample_image](public/Jansen.png)

Location: Northern Australia

Expand All @@ -84,7 +84,7 @@ https://zenodo.org/records/7094916

Location: Bamberg, Germany

<img src="public/Troles.png" alt="sample_image" style="width:300px; height:auto;">
![sample_image](public/Troles.png")

### Wagner et al. 2023

Expand Down
54 changes: 43 additions & 11 deletions docs/getting_started.md
Original file line number Diff line number Diff line change
@@ -1,29 +1,61 @@
# Getting Started

The MillionTrees package is a collection of tree detection datasets. These datasets are organized by annotation geometry, "TreePointsDataset", "TreeBoxesDataset", "TreePolygonDataset". Each of these datasets contain images from many source projects.
## Installation

## Download
```
pip install MillionTrees
```

MillionTrees datasets can be download directly from python
To be able to recreate the training examples, install the optional packages

```
dataset = TreePointsDataset(download=True, root_dir=<directory to save data>)
MillionTrees[training]
```

## Visualize
## Load the data

```
from milliontrees.datasets.TreePoints import TreePointsDataset
dataset = TreePointsDataset(download=True, root_dir=<directory>)
for image, label, metadata in dataset:
plot_points(image, label)
image.shape == (3, 100, 100)
label.shape == (2,)
# Two fine-grained domain and a label of the coarse domain? This is still unclear see L82 of milliontrees_dataset.py
assert len(metadata) == 2
break
```
### Train a model

## Train
```
trainer.fit(model, train_dataloader)
```

*Note* To install the train dependencies, please run pip install MillionTrees[train]. These are solely for the reproducible examples.
## Evaluate predictions

```
from milliontrees.common.data_loaders import get_eval_loader
# Get the test set
test_data = dataset.get_subset(
"test",
transform=transforms.Compose(
[transforms.Resize((224, 224)), transforms.ToTensor()]
),
)
# Prepare the data loader
test_loader = get_eval_loader("standard", test_data, batch_size=16)
# Get predictions for the full test set
for x, y_true, metadata in test_loader:
y_pred = model(x)
# Accumulate y_true, y_pred, metadata
# Evaluate
dataset.eval(all_y_pred, all_y_true, all_metadata)
# {'recall_macro_all': 0.66, ...}
```

## Evaluate
## Submit to the leaderboard

## Submit
We accept submissions as .csv files
50 changes: 50 additions & 0 deletions docs/submission_guidelines.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
# Submissions
Thank you for submitting to the MillionTrees leaderboards. The format for this benchmark follows the excellent [Wilds Benchmark](https://wilds.stanford.edu/submit/).

We welcome submissions of new algorithms and/or models, and we encourage contributors to test their new methods on as many datasets as applicable. This is valuable even if (or especially if) your method performs well on some datasets but not others.

We also welcome re-implementations of existing methods. On the leaderboards, we distinguish between official submissions (made by the authors of a method) and unofficial submissions (re-implementations by other contributors). Unofficial submissions are equally valuable, especially if the re-implementations achieve better performance than the original implementations because of better tuning or simple tweaks.

All submissions must use the dataset classes and evaluators in the MillionTrees package. In addition, they must report results on 3 random seeds.

Submissions fall into two categories: standard submissions and non-standard submissions.

## Standard submissions

Standard submissions must follow these guidelines:

* Results must be reported on at least 3 random seeds.
* The test set must not be used in any form for model training or selection.
* The validation set must be either the official out-of-distribution (OOD) validation set or, if applicable, the official in-distribution (ID) validation set.
* The validation set should only be used for hyperparameter selection. For example, after hyperparameters have been selected, do not combine the validation set with the training set and retrain the model.
* Training and model selection should not use any additional data, labeled or unlabeled, beyond the official training and validation data.
* To avoid unintended adaptation, models should not use batch statistics during evaluation. BatchNorm is accepted in its default mode (where it uses batch statistics during training, and then fixes them during evaluation).

## Non-standard submissions
Non-standard submissions only need to follow the first two guidelines from above:

* Results must be reported on at least 3 random seeds.
* The test set must not be used in any form for model training or selection.

These submissions will be differentiated from standard submissions in our leaderboards. They are meant for the community to try out different approaches to solving these tasks. Examples of non-standard submissions might include Using unlabeled data from external sources, specialized methods for particular datasets/domains.

### Making a submission

Submitting to the leaderboard consists of two steps: first, uploading your predictions in .csv format, and second, filling up our submission form.

## Submission formatting
Please submit your predictions in .csv format for all datasets except GlobalWheat, and .pth format for the GlobalWheat dataset. The example scripts in the examples/ folder will automatically train models and save their predictions in the right format; see the Get Started page for information on how to use these scripts.

If you are not using the example scripts, see the last section on this page for details on the expected format.

### Step 1: Uploading your predictions
Upload a .tar.gz or .zip file containing your predictions in the format specified above. Feel free to use any standard host for your file (Google Drive, Dropbox, etc.).

Check that your predictions are valid by running the evaluate.py script on them. To do so, run python3 examples/evaluate.py [path_to_predictions] [path_to_output_results] --root_dir [path_to_data].

Please upload a separate .tar.gz or .zip file per method that you are submitting. For example, if you are submitting algorithm A and algorithm B, both of which are evaluated on 6 different datasets, then you should submit two different .tar.gz or .zip files: one corresponding to algorithm A (and containing predictions for all 6 datasets) and the other corresponding to algorithm B (also containing predictions for all 6 datasets.)

### Step 2: Filling out the submission form
Next, fill up the submission form. You will need to fill out one form per .tar.gz/.zip file submitted. The form will ask for the URL to your submission file.

Once these steps have been completed, we will evaluate the predictions using the evaluate.py script and update the leaderboard within a week.

0 comments on commit 70572cc

Please sign in to comment.