If you’re working on a python package, and you’re using setuptools
‘s pkg_resources
for handling data files in your package, then you should add __init__.py
files to all your internal sub-directories. In fact, just make your life easier and always add __init__.py
packages everywhere as if you were using python2, because not having them also creates issues with some linters and other tools.
__init__.py
files used to be obligatory for sub-packages up until PEP-420 –yes, 420– which was implemented in python 3.3. Now python
itself can handle code in any subdirectories as modules without needing an __init__.py
file everywhere.
That said, certain libraries and tools don’t handle modules without __init__.py
files correctly. In particular pkg_resources
behaves weirdly when you try to access data in a module that doesn’t have an __init__.py
file.
Suppose you have the directory structure below. In this situation, if you run resource_exists("mod1.mod2", "some_data_file.txt")
–where resource_exists
is imported from pkg_resources
— while standing on the root of the directory tree below, it will raise a NotImplementedError
, which is not the most helpful of messages.
.
└── mod1
├── __init__.py
└── mod2
└── some_data_file.txt
This is if you’re not working on an installed package. If you actually install it, you may get a weirder expecting str, bytes or os.Pathlike, not None
kind of error. I’m not sure when you get one or the other, but the point is: if there isn’t an __init__.py
file, there’s a good chance you won’t get what you’re looking for.
To fix this you just have to create an __init__.py
file; ending up with the following directory structure; and then resource_exists("mod1.mod2", "some_data_file.txt")
just returns True
.
.
└── mod1
├── __init__.py
└── mod2
├── __init__.py
└── some_data_file.txt
The case of pkg_resources.resource_filename
is even more fun. Check this out.
When ran on a non-installed package and if you don’t have the inner __init__.py
, resource_filename("mod1.mod2", "some_data_file.txt")
will return "some_data_file.txt"
. So, just your second argument. Not even "mod1/mod2/some_data_file.txt"
. Just the final filename. Why it does this, I’m not sure.
If you do have the __init__.py
then it will return "/absolute/path/to/your/project/mod1/mod2/some_data_file.txt"
. This is the normal behaviour of this function. You usually want to open
the output of resource_filename
, so the path must be complete (although it doesn’t need to be absolute).
Note that if you actually pass it a module that doesn’t exist (as opposed to a module that exists but doesn’t have an __init__.py
file), it will raise a ModuleNotFoundError
.
This means something like the following function works even without an __init__.py
. Of course, I don’t need to tell you not to use a function like this one as it is the dirtiest hack around, and probably has its own set of bugs and edge-cases, but I though it was funny that something like this worked when I tested it.
def resource_filename(module_path, resource_path):
pkg_resources_path = pkg_resources.resource_filename(module_path, resource_path)
if "/" not in pkg_resources_path:
mod = sys.modules[module_path]
if isinstance(mod.__path__, list):
module_fs_path = mod.__path__[0]
else:
# Hoping this behaves like a _NamespacePath.
module_fs_path = mod.__path__._path[0]
os.mknod(os.path.join(module_fs_path, "__init__.py"))
importlib.reload(mod)
pkg_resources_path = pkg_resources.resource_filename(module_path, resource_path)
return pkg_resources_path
On a different note, as I was writing this I also noticed that the .__path__
attribute of a module returns a List[str]
if the module has an __init__.py
and a _NamespacePath
, –whose attribute _path
is a List[str]
— if the module doesn’t. That’s fine because you really shouldn’t expect consistent behaviour from __*__
variables, but I thought it was a weird quirk. In a more general way, if you don’t have an __init__.py
, your module is called a namespace
and looks different from a “normal” module in several ways.
The lesson is PEP-420 did not save us from __init__.py
files. Just use them everywhere and save yourself future problems.
One response to “Add __init__.py files if using pkg_resources”
[…] View Reddit by greenuserman – View Source […]