Add __init__.py files if using pkg_resources


If you’re working on a python package, and you’re using setuptools‘s pkg_resources for handling data files in your package, then you should add __init__.py files to all your internal sub-directories. In fact, just make your life easier and always add __init__.py packages everywhere as if you were using python2, because not having them also creates issues with some linters and other tools.

__init__.py files used to be obligatory for sub-packages up until PEP-420 –yes, 420– which was implemented in python 3.3. Now python itself can handle code in any subdirectories as modules without needing an __init__.py file everywhere.

That said, certain libraries and tools don’t handle modules without __init__.py files correctly. In particular pkg_resources behaves weirdly when you try to access data in a module that doesn’t have an __init__.py file.

Suppose you have the directory structure below. In this situation, if you run resource_exists("mod1.mod2", "some_data_file.txt") –where resource_exists is imported from pkg_resources— while standing on the root of the directory tree below, it will raise a NotImplementedError, which is not the most helpful of messages.

.
└── mod1
    ├── __init__.py
    └── mod2
        └── some_data_file.txt

This is if you’re not working on an installed package. If you actually install it, you may get a weirder expecting str, bytes or os.Pathlike, not None kind of error. I’m not sure when you get one or the other, but the point is: if there isn’t an __init__.py file, there’s a good chance you won’t get what you’re looking for.

To fix this you just have to create an __init__.py file; ending up with the following directory structure; and then resource_exists("mod1.mod2", "some_data_file.txt") just returns True.

.
└── mod1
    ├── __init__.py
    └── mod2
        ├── __init__.py
        └── some_data_file.txt

The case of pkg_resources.resource_filename is even more fun. Check this out.

When ran on a non-installed package and if you don’t have the inner __init__.py, resource_filename("mod1.mod2", "some_data_file.txt")will return "some_data_file.txt". So, just your second argument. Not even "mod1/mod2/some_data_file.txt". Just the final filename. Why it does this, I’m not sure.

If you do have the __init__.py then it will return "/absolute/path/to/your/project/mod1/mod2/some_data_file.txt". This is the normal behaviour of this function. You usually want to open the output of resource_filename, so the path must be complete (although it doesn’t need to be absolute).

Note that if you actually pass it a module that doesn’t exist (as opposed to a module that exists but doesn’t have an __init__.py file), it will raise a ModuleNotFoundError.

This means something like the following function works even without an __init__.py. Of course, I don’t need to tell you not to use a function like this one as it is the dirtiest hack around, and probably has its own set of bugs and edge-cases, but I though it was funny that something like this worked when I tested it.

def resource_filename(module_path, resource_path):
    pkg_resources_path = pkg_resources.resource_filename(module_path, resource_path)
    if "/" not in pkg_resources_path:
        mod = sys.modules[module_path]
        if isinstance(mod.__path__, list):
            module_fs_path = mod.__path__[0]
        else:
            # Hoping this behaves like a _NamespacePath.
            module_fs_path = mod.__path__._path[0]
        os.mknod(os.path.join(module_fs_path, "__init__.py"))
        importlib.reload(mod)
        pkg_resources_path = pkg_resources.resource_filename(module_path, resource_path)
    return pkg_resources_path


On a different note, as I was writing this I also noticed that the .__path__ attribute of a module returns a List[str] if the module has an __init__.py and a _NamespacePath, –whose attribute _path is a List[str]— if the module doesn’t. That’s fine because you really shouldn’t expect consistent behaviour from __*__ variables, but I thought it was a weird quirk. In a more general way, if you don’t have an __init__.py, your module is called a namespace and looks different from a “normal” module in several ways.

The lesson is PEP-420 did not save us from __init__.py files. Just use them everywhere and save yourself future problems.


One response to “Add __init__.py files if using pkg_resources”

Leave a comment

Create a website or blog at WordPress.com