The Complete Library in the Four Branches of Literature of Ming history and Yuan history(CLFBL-MY) Dataset is released for the research of Traditional Chinese character recognition and detection. Text images are from the Ming history and Yuan history part of Complete Library in the Four Branches of Literature.
Download link:
Baidu netdisk: https://pan.baidu.com/s/1QSNTLHkjLL7Ea5RczDBDHA (password: 2k4b)
Google Drive: https://drive.google.com/file/d/1IYHfmxzI2nmR98_HonO4A4rx33o7Rw2B/view?usp=sharing
The dataset file is organized as follows:
The page folder and page_text file contain images and corresponding page text.
The text_line folder contains text line images which are cut from the original page images, and the images are all rotated 90° for our experiment requirements.
The line_text file contains text line labels of all the images in text_line folder.
The page_text file contains the location information of text lines in each page image.
Note: The number of page images and page location information is not equal, since we delete some loaction information when the image quality is bad.
Here are some page images and text line images in CLFBL-MY Dataset:
If you have any question about the dataset, please contact: [email protected]