Pyinstaller And Tesseract Ocr
Solution 1:
Assuming you're on Windows, I ran into this problem and think I solved it by compiling a static version of tesseract (which does not need to be installed) and including its path as a binary in the pyinstaller spec file.
Official compiling instructions here:
https://tesseract-ocr.github.io/tessdoc/Compiling.html#windows
Install MS Visual Studio 15 (with c++) and vcpkg and execute one of the following through command prompt:
for 64-bit: vcpkg install tesseract:x64-windows-static
for 32-bit: vcpkg install tesseract:x86-windows-static
The tesseract executable will be located a few subfolders within the vcpkg folder on your PC. With that file, you also need to download a .trainneddata file and place it within a folder called 'tessdata' in the same directory with the tesseract exe.
Create a pyinstaller spec file and edit the Analysis(binaries=[]) section to include the folder path where tesseract is located (if you're not using a subfolder for tesseract I think you'd need to add both tesseract.exe and the tessdata subfolder). I also changed inclide_binaries=True
Run pyinstaller and include the option --specpath 'yourspecfile.spec'
I haven't yet attempted to try it on a different PC, so haven't fully tested that it works as intended (I don't know anything about compiling c++, there may be additional files/links needed for tesseract that are still intact since I've only been testing on the build PC)
Solution 2:
@Zstr33's answer is correct, but it lacked detail. Following instructions have been tested on Windows 10 64-bit. Link to official compiling instructions here: https://tesseract-ocr.github.io/tessdoc/Compiling.html#windows.
Steps:
Install Visual Studio. Make sure to install the below items:
Then, click on individual components.
You can add whatever other components you want, but those are the ones that are needed to compile tesseract into a static binary. Also, if you don't use English, click on the language packs tab and add the English Language pack, this is needed for vcpkg.
Follow the quick start guide for installing vcpkg, found here: https://github.com/microsoft/vcpkg#getting-started.
Navigate to where you copied the vcpkg directory, or add it to path. Then run:
vcpkg install tesseract:x64-windows-static
for 64-bit, orvcpkg install tesseract:x86-windows-static
for 32-bit.Go to
place where you put the tesseract directory\tesseract_x64-windows-static\tools\tesseract
for 64-bit, andplace where you put the tesseract directory\tesseract_x86-windows-static\tools\tesseract
for 32-bit.
- To use with pyinstaller, using
--onefile
.
Solution 3:
I did get it to run with Pyinstaller after all.
First, I needed to create 2 Hook files as described here:
https://github.com/jbarlow83/OCRmyPDF/issues/659#issuecomment-714479684
Then, when running the exe, I still got an error missing pikepdf._cpphelpers
To solve that, just add
from pikepdf import _cpphelpers
in your python file as described here:
My Pyinstaller call looks like that:
pyinstaller --onefile appname.py --paths="C:\python\anaconda3\envs\appname\Lib\site-packages" --additional-hooks-dir="C:\coding\appname\Hooks"
Solution 4:
since bundling everything up with pyinstaller could be a real pain, I did the following steps:
- Imported Pytesseract in my script
- created the Exe file with pyinstaller (without defining anything in my spec file)
- bundled Tesseract-Ocr installer and my script.exe with an external installer creator.
So the final user will have both the tesseract installer and tesseract. With the external installer you have a lot of freedom and you can also play with the path variable.
Solution 5:
I tried with pyinstaller and ocrmypdf forever and did not get it to work. I ended up using Nuitka. Worked right from the start :-)
Use sth. like:
python -m nuitka --mingw64--standalone--follow-imports yourapp.py
http://nuitka.net/doc/user-manual.html
There was a similar answer here somewhere already, just could not find it anymore to link to it.
Post a Comment for "Pyinstaller And Tesseract Ocr"