Skip to content Skip to sidebar Skip to footer

Pyinstaller And Tesseract Ocr

I am using Tesseract OCR for my program and I am going to convert it into a single .exe file using pyinstaller. The problem is that in order for Tesseract to work, I need to refere

Solution 1:

Assuming you're on Windows, I ran into this problem and think I solved it by compiling a static version of tesseract (which does not need to be installed) and including its path as a binary in the pyinstaller spec file.

Official compiling instructions here:

https://tesseract-ocr.github.io/tessdoc/Compiling.html#windows

Install MS Visual Studio 15 (with c++) and vcpkg and execute one of the following through command prompt:

for 64-bit: vcpkg install tesseract:x64-windows-static

for 32-bit: vcpkg install tesseract:x86-windows-static

The tesseract executable will be located a few subfolders within the vcpkg folder on your PC. With that file, you also need to download a .trainneddata file and place it within a folder called 'tessdata' in the same directory with the tesseract exe.

Create a pyinstaller spec file and edit the Analysis(binaries=[]) section to include the folder path where tesseract is located (if you're not using a subfolder for tesseract I think you'd need to add both tesseract.exe and the tessdata subfolder). I also changed inclide_binaries=True

Run pyinstaller and include the option --specpath 'yourspecfile.spec'

I haven't yet attempted to try it on a different PC, so haven't fully tested that it works as intended (I don't know anything about compiling c++, there may be additional files/links needed for tesseract that are still intact since I've only been testing on the build PC)

Solution 2:

@Zstr33's answer is correct, but it lacked detail. Following instructions have been tested on Windows 10 64-bit. Link to official compiling instructions here: https://tesseract-ocr.github.io/tessdoc/Compiling.html#windows.

Steps:

  1. Install Visual Studio. Make sure to install the below items: Click on Desktop Development with C++ and Universal Windows Platform Development

    Then, click on individual components.

    Click on the Tab Individual Components

    Then, select the following. Nuget Package Manager, MSVC v142 - VS 2019 C++ x64/x86 build tools, C++ CMake Tools for Windows, MSVC v142 - VS 2019 C++ ARM64 Build Tools, and NuGet targets and build tasks

    You can add whatever other components you want, but those are the ones that are needed to compile tesseract into a static binary. Also, if you don't use English, click on the language packs tab and add the English Language pack, this is needed for vcpkg.

  2. Follow the quick start guide for installing vcpkg, found here: https://github.com/microsoft/vcpkg#getting-started.

  3. Navigate to where you copied the vcpkg directory, or add it to path. Then run: vcpkg install tesseract:x64-windows-static for 64-bit, or vcpkg install tesseract:x86-windows-static for 32-bit.

  4. Go to place where you put the tesseract directory\tesseract_x64-windows-static\tools\tesseract for 64-bit, and place where you put the tesseract directory\tesseract_x86-windows-static\tools\tesseract for 32-bit.

  1. To use with pyinstaller, using --onefile.

Solution 3:

I did get it to run with Pyinstaller after all.

First, I needed to create 2 Hook files as described here:

https://github.com/jbarlow83/OCRmyPDF/issues/659#issuecomment-714479684

Then, when running the exe, I still got an error missing pikepdf._cpphelpers

To solve that, just add

from pikepdf import _cpphelpers

in your python file as described here:

How to fix a pyinstaller 'no module named...' error when my script imports the modules pikepdf and pdfminer3?

My Pyinstaller call looks like that:

pyinstaller --onefile appname.py --paths="C:\python\anaconda3\envs\appname\Lib\site-packages" --additional-hooks-dir="C:\coding\appname\Hooks"

Solution 4:

since bundling everything up with pyinstaller could be a real pain, I did the following steps:

  1. Imported Pytesseract in my script
  2. created the Exe file with pyinstaller (without defining anything in my spec file)
  3. bundled Tesseract-Ocr installer and my script.exe with an external installer creator.

So the final user will have both the tesseract installer and tesseract. With the external installer you have a lot of freedom and you can also play with the path variable.

Solution 5:

I tried with pyinstaller and ocrmypdf forever and did not get it to work. I ended up using Nuitka. Worked right from the start :-)

Use sth. like:

python -m nuitka --mingw64--standalone--follow-imports  yourapp.py

http://nuitka.net/doc/user-manual.html

There was a similar answer here somewhere already, just could not find it anymore to link to it.

Post a Comment for "Pyinstaller And Tesseract Ocr"