A proof of concept of a small OCR text recognition app using python + tesseract-ocr within Docker
docker@^24.0.3
- Clone the repo executing
git clone git@github.com:dmmarmol/python-ocr.git - Navigate into
cd python-ocr - Run the command
make build(Makefile shortcut to build the docker image) - Run the command
make run - Run the command
make shellto attach a shell to the running container - Choose between bulk process images or process text
This command will read all images inside the images/source directory and will extract the text content from each of them putting them all together in a new file inside images/output
- Deposit any
.jpgorjpegfile insideimages/sourcedirectory - Navigate inside the container using an attached shell and from the
app/directory, run the commandpython3 src/process-images.py
This command will read all .txt files inside the text/source directory and will normalize the text content from each of file putting them all together in a new file inside text/output
- Deposit any
.txtfile insidetext/sourcedirectory - Navigate inside the container using an attached shell and from the
app/directory, run the commandpython3 src/process-text.py
Build the Dockerfile image
make build
Run a docker container instance of the Dockerimage
make run
make stop
make remove