Skip to content Skip to sidebar Skip to footer

Is There A Way To Combine A Python Project Codebase That Spans Across Different Files Into One File?

The reason I want to this is I want to use the tool pyobfuscate to obfuscate my python code. Butpyobfuscate can only obfuscate one file.

Solution 1:

I've answered your direct question separately, but let me offer a different solution to what I suspect you're actually trying to do:

Instead of shipping obfuscated source, just ship bytecode files. These are the .pyc files that get created, cached, and used automatically, but you can also create them manually by just using the compileall module in the standard library.

A .pyc file with its .py file missing can be imported just fine. It's not human-readable as-is. It can of course be decompiled into Python source, but the result is… basically the same result you get from running an obfuscater on the original source. So, it's slightly better than what you're trying to do, and a whole lot easier.

You can't compile your top-level script this way, but that's easy to work around. Just write a one-liner wrapper script that does nothing but import the real top-level script. If you have if __name__ == '__main__': code in there, you'll also need to move that to a function, and the wrapper becomes a two-liner that imports the module and calls the function… but that's as hard as it gets.) Alternatively, you could run pyobfuscator on just the top-level script, but really, there's no reason to do that.

In fact, many of the packager tools can optionally do all of this work for you automatically, except for writing the trivial top-level wrapper. For example, a default py2app build will stick compiled versions of your own modules, along with stdlib and site-packages modules you depend on, into a pythonXY.zip file in the app bundle, and set up the embedded interpreter to use that zipfile as its stdlib.


Solution 2:

There are a definitely ways to turn a tree of modules into a single module. But it's not going to be trivial. The simplest thing I can think of is this:

First, you need a list of modules. This is easy to gather with the find command or a simple Python script that does an os.walk.

Then you need to use grep or Python re to get all of the import statements in each file, and use that to topologically sort the modules. If you only do absolute flat import foo statements at the top level, this is a trivial regex. If you also do absolute package imports, or from foo import bar (or from foo import *), or import at other levels, it's not much trickier. Relative package imports are a bit harder, but not that big of a deal. Of course if you do any dynamic importing, use the imp module, install import hooks, etc., you're out of luck here, but hopefully you don't.

Next you need to replace the actual import statements. With the same assumptions as above, this can be done with a simple sed or re.sub, something like import\s+(\w+) with \1 = sys.modules['\1'].

Now, for the hard part: you need to transform each module into something that creates an equivalent module object dynamically. This is the hard part. I think what you want to do is to escape the entire module code so that it can put into a triple-quoted string, then do this:

import types
mod_globals = {}
exec('''
# escaped version of original module source goes here
''', mod_globals)
mod = types.ModuleType(module_name)
mod.__dict__.update(mod_globals)
sys.modules[module_name] = mod

Now just concatenate all of those transformed modules together. The result will be almost equivalent to your original code, except that it's doing the equivalent of import foo; del foo for all of your modules (in dependency order) right at the start, so the startup time could be a little slower.


Solution 3:

You can make a tool that:

  • Reads through your source files and puts all identifiers in a set.
  • Subtracts all identifiers from recursively searched standard- and third party modules from that set (modules, classes, functions, attributes, parameters).
  • Subtracts some explicitly excluded identifiers from that list as well, as they may be used in getattr/setattr/exec/eval
  • Replaces the remaining identifiers by gibberish

Or you can use this tool I wrote that does exactly that.

To obfuscate multiple files, use it as follows:

  • For safety, backup your source code and valuable data to an off-line medium.
  • Put a copy of opy_config.txt in the top directory of your project.
  • Adapt it to your needs according to the remarks in opy_config.txt.
  • This file only contains plain Python and is exec’ed, so you can do anything clever in it.
  • Open a command window, go to the top directory of your project and run opy.py from there.
  • If the top directory of your project is e.g. ../work/project1 then the obfuscation result will be in ../work/project1_opy.
  • Further adapt opy_config.txt until you’re satisfied with the result.
  • Type ‘opy ?’ or ‘python opy.py ?’ (without the quotes) on the command line to display a help text.

Solution 4:

I think you can try using the find command with -exec option.

you can execute all python scripts in a directory with the following command.

find . -name "*.py" -exec python {} ';'

Wish this helps.

EDIT:

OH sorry I overlooked that if you obfuscate files seperately they may not run properly, because it renames function names to different names in different files.


Post a Comment for "Is There A Way To Combine A Python Project Codebase That Spans Across Different Files Into One File?"