A pytorch implementation of the paper Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement.

clone the repository
git clone https://github.com/dali92002/SSL-OCR
cd SSL-OCRCreate the following environment named vit with Anaconda. Then, Activate it.
conda env create -f environment.yml
conda activate vitTo train the model from scratch, use this command ans specify your configs:
python train.py --data_path ./data/ --img_height 64 --img_width 256 --train_type htr_Augm --batch_size 64 --vit_patch_size 8 Here I specified to use data augmentation for a better training, also I set the image sizes and the vit patch size to be 8x8. You can however use your custom configurations, check Config.py.
During training there will be a validation in each epoch, the best weights will be saved in a folder named ./weights/ and the predictions will be saved in a folder named ./pred_logs/
For the pretraining of Text-DIAE, use this command and specify the data path and the folder to save your models weights --weights_path
python pretrain.py --data_path /data/ --weights_path /your_folder/ --img_height 64 --img_width 256 --train_type htr_Augm --batch_size 48 --vit_patch_size 8For the fine-tuning of Text-DIAE, use this command and specify the data path, the folder to save your models weights --weights_path and the pretrained encoder to be used --pretrained_encoder_path:
python fine_tune.py --data_path /data/users/msouibgui/datasets/vatican/ --pretrained_encoder_path /your_folder/checkpoint-454_epoch77.pt --img_height 64 --img_width 256 --train_type htr_Augm --batch_size 128 --vit_patch_size 8To test the model, run the following command. It will recognize the testing data using the trained model, you should specify which model you want to use by profiding its path --test_model:
python test.py --data_path /data/ --img_height 64 --img_width 256 --train_type htr_Augm --batch_size 128 --vit_patch_size 8 --test_model ./weights/best-seq2seq_htr_Augm_64_256_8.ptAfter rnning the testing you will get the predictions in the folder ./pred_logs/ as well as the CER and WER.
Coming soon !!
If you find this useful for your research, please cite it as follows:
@inproceedings{souibgui2022text,
title={Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement},
author={Souibgui, Mohamed Ali and Biswas, Sanket and Mafla, Andres and Biten, Ali Furkan and Forn{\'e}s, Alicia and Kessentini, Yousri and Llad{\'o}s, Josep and Gomez, Lluis and Karatzas, Dimosthenis},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
year={2023}
}