Skip to content

lcy0604/CTRNet-plus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CTRNet++

This repository is the implementation of "CTRNet++: Dual-path Learning with Local-global Context Modeling for Scene Text Removal", which is published on TOMM 2024. paper

The training and inference codes are available.

For any questions, please email to me at [email protected]. Thank you for your interest.

Environment

My environment can be refered as follows:

  • Python 3.8.11
  • PyTorch 1.8.0
  • Polygon
  • shapely
  • skimage

Datasets

We use SCUT-EnsText and SCUT-Syn.

All the images are set to 512 * 512. The strucuture images for LCG block are generated by the official code in RTV methods. You can generate the data yourselves, and we will also provide the test data here. data.

After downloading the dataset, you can directly place the folders as

data/
--SCUT-ENS
----train
------image/*.jpg
------label/*.jpg
------mask/*.jpg
------gt/*.txt

----test
------image/*.jpg
------label/*.jpg
------mask/*.jpg
------gt/*.txt
...

The mask can be generated using the gt by OpenCV (cv2.drawContours), the example are shown in here.

Training

Create an new directory (./pretrained/) and place the pretrain weights for FFC-based inpainting model--LaMa, VGG-16, and our pretrain model for structure generator. All of them are available at here. You can also retrain the structure generator yoursellf.

CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 --master_port=8942 --use_env \
    main.py \
    --train_dataset scutens_train \
    --val_dataset scutens_test \
    --data_root ./data/ \
    --output_dir ./checkpoint/ \
    --batch_size 4 \
    --lr 0.0005 \
    --num_workers 8 \
    --code_dir . \
    --epochs 300 \
    --save_interval 10 \
    --warmup_epochs 10 \
    --dataset_file erase \
    --rotate_max_angle 10 \
    --rotate_prob 0.3 \
    --crop_min_ratio 0.7 \
    --crop_max_ratio 1.0 \
    --crop_prob 1.0 \
    --pixel_embed_dim 512 \
    --train     

Testing

For generating the results with text removal, the commond is as follows:

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --nproc_per_node=1 --master_port=8941 --use_env \
    main.py \
    --train_dataset scutens_train \
    --val_dataset scutens_test \
    --data_root ./data/ \
    --output_dir ./checkpoint/ \
    --batch_size 1 \
    --num_workers 0 \
    --code_dir . \
    --dataset_file erase \
    --eval \
    --resume your/checkpoint/file

Acknowledge

The repository is benefit a lot from DETR, Lama, SPL, ResTormer, and CTSDG. Thanks a lot for their excellent work.

Citation

If you find our method or dataset useful for your reserach, please cite:

@article{CTRNetpp,
        author = {Liu, Chongyu and Peng, Dezhi and Liu, Yuliang and Jin, Lianwen},
        title = {CTRNet++: Dual-path Learning with Local-global Context Modeling for Scene Text Removal},
        year = {2024},
        publisher = {Association for Computing Machinery},
        address = {New York, NY, USA},
        issn = {1551-6857},
        url = {https://doi.org/10.1145/3697837},
        doi = {10.1145/3697837},
        note = {Just Accepted},
        journal = {ACM Trans. Multimedia Comput. Commun. Appl.},
        month = oct,
        keywords = {Scene Text Removal, Context Guidance, Dual-path Learning}
}

Feedback

Suggestions and opinions of our work (both positive and negative) are welcome. Please contact the authors by sending email to Chongyu Liu([email protected]). For commercial usage, please contact Prof. Lianwen Jin via ([email protected]).

About

The official implement of CTRNet++.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages