PaddleOCR is a brand new OCR system coming up with the promise to support 80+ languages recognition, being lightweight and therefore ideal for deployment on mobile, embedded, or IoT devices. It is based on the larger PaddlePaddle deep learning framework and therefore can be executed on both CPUs and GPUs. The latter option implies a significant speed increase compared to OCR engines like Tesseract. In this tutorial, we do the test and try to set up PaddleOCR 2.0 and process the first image. The last section of this article compares PaddleOCR to Tesseract.
All you need is a PC or Laptop, a basic understanding of Python, and about 30 minutes. Since PaddleOCR requires quite a lot of dependencies that might screw up your local Python installation, it is highly recommended to install the framework in a virgin container environment. Our choice is Anaconda, which is available for Linux, Windows, and macOS.
Step 1: Set up a new Anaconda Environment
To set up a new container download the latest version of Anaconda for Python 3.8 here and install it following the instructions. The next steps for setting up a new Anaconda environment are for Ubuntu Linux but the procedure should work similarly on the other systems.
After the installation, you need to initialize Anaconda by calling:
$INSTALL_DIR/anaconda3/bin/conda init bash
where $INSTALL_DIR is the path to your local Anaconda installation.
This command will append some code to your .bashrc activating anaconda by default. If you do not want this simply remove the lines between:
# >>> conda initialize >>>
…
# <<< conda initialize <<<
and execute them manually, whenever you plan to use an Anaconda environment.
To move on simply open a new shell and create a new environment called “ocr”, or whatever you want:
conda create --name ocr python=3.8
conda activate ocr
Step 2: Download the PaddleOCR Engine Example from GitHub
You find the source code used for this PaddleOCR engine example project on GitHub. Either download the main branch as a zip archive from the PaddleOCR Engine Example Git Repository.
Alternatively, you clone it directly using SSH:
git clone git@github.com:Converter-App/paddleOCR-engine.git
Step 3: Install all Dependencies using PIP
In the source tree of paddleOCR-engine, you find a file called requirements.txt. Execute the command below to install all dependencies.
pip3 install -r 'requirements.txt'
Congratulations, PaddleOCR is running now on your system! The pre-trained models have to be located in the .paddleocr folder in your home directory. In theory, the framework should download all models automatically once running it. However, during this test, the download failed several times due to connection problems. Alternatively, you can also simply copy the .paddleocr folder from the source tree to your home directory. An up-to-date list of PaddleOCR models is available here.
Step 4: Finally you can run the OCR engine
In the active “ocr” environment simply type:
python ocr_engine.py --language en --image_path /path/to/your/image.jpg
It will print the output and create two output files:
- result.txt: contains with the bounding box information of all detected lines
- boxes.txt: Contains the bounding box information of all detected lines
The default names of these files can be changed using the --output and --boxes parameters. If you want visualize the detected words on top of the original image you can use the --visualize flag.
Step 5: Enable GPU Support
Completing this setup procedure up to here, the OCR engine will use your CPU to process all images. In case you have a fast CPU and want to use it you need to proceed with the following steps:
The command below will replace the current PaddlePaddle installation with a version supporting GPUs:
python3 -m pip install paddlepaddle-gpu==2.0.0rc1.post101 -f https://paddlepaddle.org.cn/whl/stable.html
For running PaddleOCR on a GPU, you will also need to install CUDA and cuDNN. The easiest way of installing CUDA 10 on Ubuntu is using this installation script for CUDA 10.
Furthermore, you have to make sure that you are using cuDNN version 7.6.0 since PaddlePaddle 2.0 was compiled using this version. Higher versions might work too, but the previous one works for sure. In case you get the error below, you have to download and install cuDNN 7.6.0.
ExternalError: Cudnn error, CUDNN_STATUS_BAD_PARAM (at /paddle/paddle/fluid/operators/batch_norm_op.cu:199)
[Hint: If you need C++ stacktraces for debugging, please set FLAGS_call_stack_level=2.]
[operator < batch_norm > error]
You can download a package called cudnn-10.1-linux-x64-v7.6.0.64.tgz directly from the cuDNN archive at nvidia.com after creating a free account.
When using Ubuntu, just extract it and add it as the first entry to your LD_LIBRARY_PATH variable:
export LD_LIBRARY_PATH=/$DIRTO/cuda/lib64:$LD_LIBRARY_PATH
Benchmark: Comparing PaddleOCR 2.0 to Tesseract 4.0
For the benchmark, PaddleOCR 2.0 on a laptop CPU and PaddleOCR on an Nvidia GTX 1080 GPU were compared to Tesseract 4.0, using the same laptop. For each setup, the 10 same images with English texts were processed. In total, the test data set had about 5000 words.
Accuracy: Doing a first direct comparison between PaddleOCR and Tesseract concerning recognition accuracy, the latter candidate seemed to do a better job. While Tesseract reached an accuracy of over 98%, the PaddleOCR example engine only reached 92%.
At the first glance, this result looked pretty disappointing for PaddleOCR. However, having a closer look at the types of errors PaddleOCR committed, it was clear, that most of them easily could be fixed during post-processing.
The first class of off errors showed a clear pattern, and was especially straightforward to fix: It consisted simply of missing white spaces after punctuation marks e.g. “experience,leading”. A simple post-processing algorithm could handle all of them and is available in the example OCR engine.
The second common class of errors was missing white spaces between two or three words "Thispaper", "Secondwe", "Ourdecision". Nearly all of these errors were detected by a standard spell-checker. An automated fix for this problem certainly also can be included into the post-processing algorithm.
Tesseract | PaddleOCR |
PaddleOCR (Puntucation fixed) |
PaddleOCR (White-spaces fixed) |
Number of Errors | 95 | 387 | 274 | 164 |
Accuracy | 0.98 | 0.92 | 0.96 | 0.97 |
After applying these two corrections, the PaddleOCR engine showed a competitive result for the English sample data. Apparently, at the moment most development effort is put into the Chinese language version. Most likely PaddleOCR outperforms Tesseract already for Chinese texts, but this has not been the subject of this test.
Speed and Model File Size: In the category of speed and model size PaddleOCR with GPU support is already the clear winner.
Tesseract | PaddleOCR CPU | PaddleOCR GPU | Speed | 3.83 s | 4.12 s | 2.07 s |
English Model Size | 23 MB | 2 MB | 2 MB |
The pre-trained English model has only a file size of about 2 MB which is just 10% of the Tesseracts 23 MB model. Tesseract took on average 3.8 seconds per image while PaddleOCR with GPU did the job in just 2.07 s per image, which means a performance increase of 46% using a GPU.