If you’re working on a python package, and you’re using
pkg_resources for handling data files in your package, then you should add
__init__.py files to all your internal sub-directories. In fact, just make your life easier and always add
__init__.py packages everywhere as if you were using python2, because not having them also creates issues with some linters and other tools.
__init__.py files used to be obligatory for sub-packages up until PEP-420 –yes, 420– which was implemented in python 3.3. Now
python itself can handle code in any subdirectories as modules without needing an
__init__.py file everywhere.
That said, certain libraries and tools don’t handle modules without
__init__.py files correctly. In particular
pkg_resources behaves weirdly when you try to access data in a module that doesn’t have an
Suppose you have the directory structure below. In this situation, if you run
resource_exists("mod1.mod2", "some_data_file.txt") –where
resource_exists is imported from
pkg_resources— while standing on the root of the directory tree below, it will raise a
NotImplementedError, which is not the most helpful of messages.
. └── mod1 ├── __init__.py └── mod2 └── some_data_file.txt
This is if you’re not working on an installed package. If you actually install it, you may get a weirder
expecting str, bytes or os.Pathlike, not None kind of error. I’m not sure when you get one or the other, but the point is: if there isn’t an
__init__.py file, there’s a good chance you won’t get what you’re looking for.
To fix this you just have to create an
__init__.py file; ending up with the following directory structure; and then
resource_exists("mod1.mod2", "some_data_file.txt") just returns
. └── mod1 ├── __init__.py └── mod2 ├── __init__.py └── some_data_file.txt
The case of
pkg_resources.resource_filename is even more fun. Check this out.
When ran on a non-installed package and if you don’t have the inner
resource_filename("mod1.mod2", "some_data_file.txt")will return
"some_data_file.txt". So, just your second argument. Not even
"mod1/mod2/some_data_file.txt". Just the final filename. Why it does this, I’m not sure.
If you do have the
__init__.py then it will return
"/absolute/path/to/your/project/mod1/mod2/some_data_file.txt". This is the normal behaviour of this function. You usually want to
open the output of
resource_filename, so the path must be complete (although it doesn’t need to be absolute).
Note that if you actually pass it a module that doesn’t exist (as opposed to a module that exists but doesn’t have an
__init__.py file), it will raise a
This means something like the following function works even without an
__init__.py. Of course, I don’t need to tell you not to use a function like this one as it is the dirtiest hack around, and probably has its own set of bugs and edge-cases, but I though it was funny that something like this worked when I tested it.
def resource_filename(module_path, resource_path): pkg_resources_path = pkg_resources.resource_filename(module_path, resource_path) if "/" not in pkg_resources_path: mod = sys.modules[module_path] if isinstance(mod.__path__, list): module_fs_path = mod.__path__ else: # Hoping this behaves like a _NamespacePath. module_fs_path = mod.__path__._path os.mknod(os.path.join(module_fs_path, "__init__.py")) importlib.reload(mod) pkg_resources_path = pkg_resources.resource_filename(module_path, resource_path) return pkg_resources_path
On a different note, as I was writing this I also noticed that the
.__path__ attribute of a module returns a
List[str] if the module has an
__init__.py and a
_NamespacePath, –whose attribute
_path is a
List[str]— if the module doesn’t. That’s fine because you really shouldn’t expect consistent behaviour from
__*__ variables, but I thought it was a weird quirk. In a more general way, if you don’t have an
__init__.py, your module is called a
namespace and looks different from a “normal” module in several ways.
The lesson is PEP-420 did not save us from
__init__.py files. Just use them everywhere and save yourself future problems.