Text-DIAE

Description

A pytorch implementation of the paper Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement.

Download Code

clone the repository

git clone https://github.com/dali92002/SSL-OCR
cd SSL-OCR

Setup your environment

Create the following environment named vit with Anaconda. Then, Activate it.

conda env create -f environment.yml
conda activate vit

Train the model from scratch

To train the model from scratch, use this command ans specify your configs:

python train.py --data_path ./data/ --img_height 64 --img_width 256 --train_type htr_Augm --batch_size 64 --vit_patch_size 8

Here I specified to use data augmentation for a better training, also I set the image sizes and the vit patch size to be 8x8. You can however use your custom configurations, check Config.py.

During training there will be a validation in each epoch, the best weights will be saved in a folder named ./weights/ and the predictions will be saved in a folder named ./pred_logs/

Pretraining

For the pretraining of Text-DIAE, use this command and specify the data path and the folder to save your models weights --weights_path

python pretrain.py --data_path /data/ --weights_path /your_folder/  --img_height 64 --img_width 256 --train_type htr_Augm --batch_size 48 --vit_patch_size 8

Fine tuning the pretrained model

For the fine-tuning of Text-DIAE, use this command and specify the data path, the folder to save your models weights --weights_path and the pretrained encoder to be used --pretrained_encoder_path:

python  fine_tune.py --data_path /data/users/msouibgui/datasets/vatican/ --pretrained_encoder_path  /your_folder/checkpoint-454_epoch77.pt  --img_height 64 --img_width 256 --train_type htr_Augm --batch_size 128  --vit_patch_size 8

Test the model

To test the model, run the following command. It will recognize the testing data using the trained model, you should specify which model you want to use by profiding its path --test_model:

python  test.py --data_path /data/ --img_height 64 --img_width 256 --train_type htr_Augm --batch_size 128  --vit_patch_size 8  --test_model ./weights/best-seq2seq_htr_Augm_64_256_8.pt

After rnning the testing you will get the predictions in the folder ./pred_logs/ as well as the CER and WER.

Model Zoo

Coming soon !!

Citation

If you find this useful for your research, please cite it as follows:

@inproceedings{souibgui2022text,
  title={Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement},
  author={Souibgui, Mohamed Ali and Biswas, Sanket and Mafla, Andres and Biten, Ali Furkan and Forn{\'e}s, Alicia and Kessentini, Yousri and Llad{\'o}s, Josep and Gomez, Lluis and Karatzas, Dimosthenis},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
data/backgroundIAM		data/backgroundIAM
imgs		imgs
models		models
Config.py		Config.py
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml
fine_tune.py		fine_tune.py
htrAugmentor.py		htrAugmentor.py
loadData.py		loadData.py
loadData_pretrain.py		loadData_pretrain.py
pretrain.py		pretrain.py
test.py		test.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Text-DIAE

Description

Download Code

Setup your environment

Train the model from scratch

Pretraining

Fine tuning the pretrained model

Test the model

Model Zoo

Citation

About

Uh oh!

Releases

Packages

Languages

License

dali92002/SSL-OCR

Folders and files

Latest commit

History

Repository files navigation

Text-DIAE

Description

Download Code

Setup your environment

Train the model from scratch

Pretraining

Fine tuning the pretrained model

Test the model

Model Zoo

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages