Skip to content

Commit 0c7f4c9

Browse files
committed
add caption inference
1 parent fa60b70 commit 0c7f4c9

File tree

1,061 files changed

+279467
-3
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

1,061 files changed

+279467
-3
lines changed

.idea/workspace.xml

Lines changed: 87 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

README.md

Lines changed: 29 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -9,13 +9,39 @@ OFA is a unified multimodal pretrained model that unifies modalities (i.e., cros
99
(e.g., image generation, visual grounding, image captioning, image classification, text generation, etc.)
1010
to a simple sequence-to-sequence learning framework. For more information, please refer to our paper: [Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework](http://arxiv.org/abs/2202.03052).
1111

12-
We plan to release the code and colab notebooks soon (Feb. 2022).
13-
1412

1513
# Approach
1614
![approach](examples/approach.jpg)
1715

18-
# Examples
16+
17+
# Requirements
18+
* python 3.7.4
19+
* pytorch 1.8.1
20+
21+
# Installation
22+
```bash
23+
git clone https://github.com/OFA-Sys/OFA
24+
pip install -r requirements.txt
25+
```
26+
27+
# Datasets and Checkpoints
28+
See [datasets.md](datasets.md) and [checkpoints.md](checkpoints.md).
29+
30+
# Pretraining
31+
To release soon:)
32+
33+
# Finetuning & Inference
34+
Below we provide methods for fintuning and inference on different downstream tasks. At this moment we only provide the scripts for inference, and we will soon release those for finetuning.
35+
## Caption
36+
1. Download data and files and put them in the correct directory
37+
2. Run the commands below,
38+
39+
```bash
40+
cd run_scripts/caption
41+
sh evaluate_caption.sh
42+
```
43+
44+
# Gallery
1945
Below we provide examples of OFA in text-to-image generation and open-ended VQA. Also, we demonstrate its performance in unseen task (Grounded QA) as well as unseen domain (Visual Grounding on images from unseen domains).
2046

2147
## Text-to-Image Generation (normal query)

checkpoints.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
We provide links for you to download our checkpoints. We will release all the checkpoints including pretrained and finetuned models on different tasks.
2+
3+
* <a href="https://zheluo-mm.oss-cn-beijing.aliyuncs.com/ofa/checkpoints/caption_large_best.pt"> Finetuned checkpoint for Caption on COCO </a>

criterions/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
from .scst_loss import ScstRewardCriterion
2+
from .label_smoothed_cross_entropy import AjustLabelSmoothedCrossEntropyCriterion

0 commit comments

Comments
 (0)