'Edit package installed by pip

I'm trying to edit a package that I installed via pip, called py_mysql2pgsql (I had an error when converting my db from mysql to postgre, just like this.

However, when I got to the folder /user/local/lib/python2.7/dist-packages/py_mysql2pgsql-0.1.5.egg-info, I cannot find the source code for the package. I only find PKG-INFO and text files.

How can I find the actual source code for a package (or in particular, this package)?

Thanks



Solution 1:[1]

TL;DR:

Modifying in place is dangerous. Modify the source and then install it from your modified version.

Details

pip is a tool for managing the installation of packages. You should not modify files creating during package installation. At best, doing so would mean pip will believe that a particular version of the package is installed when it isn't. This would not interact well with the upgrade function. I suspect pip would just overwrite your customizations, discarding them forever, but I haven't confirmed. The other possibility is that it checks if files have changed and throws an error if so. (I don't think that's likely.) It also misleads other users of the system. They see that you have a package installed, but you don't actually have that version indicated; you have a customized version. This is likely to result in confusion if they try to install the unmodified version somewhere else or if they expect some particular behavior from the version installed.

If you want to modify the source code, the right thing to do is modify the source code and either build a new, custom package or just install from source. py-mysql2pgsql provides instructions for performing a source install:

> git clone git://github.com/philipsoutham/py-mysql2pgsql.git
> cd py-mysql2pgsql
> python setup.py install

You can clone the source, modify it, and then install without using pip. You could alternatively build your own customized version of the package if you need to redistribute it internally. This project uses setuptools for building its packages, so you only need to familiarize yourself with setuptools to make use of their setup.py file. Make sure that installing it this way doesn't create any misleading entries in pip's package list. If it does, either find a way to make sure the entry is more clear or find an alternative install method.

Since you've discovered a bug in the software, I also highly recommend forking it on Github and submitting a pull request once you have it fixed. If you do so, you can use the above installation instructions just by changing the repository URL to your fork. If you don't fork it, at least file an issue and describe the changes that fix it.

Alternatives:

  • You could copy all the source code into your project, modify it there, and then distribute the modified version with the rest of your code. (Make sure you don't violate the license if you do so.)
  • You might be able to solve you problem at runtime. Monkey-patching the module is a little risky if other people on your team might not expect the change in behavior, but it could be done for global modification of the module's behavior. You could also create some additional code that wraps the buggy code: it can take input, call the buggy code, and either prevents or handles the bug (e.g., modifying the input to make it work or catching an exception and handling it, etc.).

Solution 2:[2]

just print out the .__file__ attribute of the module:

>>> import numpy
>>> numpy.__file__
'/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/numpy/__init__.py'

Obviously the path and specific package would be different for you but this is pretty fool proof way of tracking down the source file of any module in python.

Solution 3:[3]

py_mysql2pgsql package is hosted on PyPI: https://pypi.python.org/pypi/py-mysql2pgsql

If you want the code for that specific version, just download the source tarball from PyPI (py-mysql2pgsql-0.1.5.tar.gz)

Development is hosted on GitHub: https://github.com/philipsoutham/py-mysql2pgsql

Solution 4:[4]

You can patch pip packages quite easily with the patch command.

When you do this, it's important that you specify exact version numbers for the packages that you patch.

I recommend using Pipenv, it creates a lock file where all versions of all dependencies and sub-dependencies are locked, so that always the same versions of packages are installed. It also manages your virtual env, and makes it convenient to use the method described here.

The first argument to the patch command is the file you want to patch. So that should be the module of the pip package, which is probably inside a virtualenv.

If you use Pipenv, you can get the virtual env path with pipenv --venv, so then you could patch the requests package like this:

patch $(pipenv --venv)/lib/python3.6/site-packages/requests/api.py < requests-api.patch

The requests.patch file is a diff file, which could look like:

--- requests/api.py     2022-05-03 21:55:06.712305946 +0200
+++ requests/api_new.py 2022-05-03 21:54:57.002368710 +0200
@@ -54,6 +54,8 @@
       <Response [200]>
     """

+    print(f"Executing {method} request at {url}")
+
     # By using the 'with' statement we are sure the session is closed, thus we
     # avoid leaving sockets open which can trigger a ResourceWarning in some
     # cases, and look like a memory leak in others.

You can make the patch file like this:

diff -u requests/api.py requests/api_new.py > requests-api.patch

Where requests/api_new.py would be the new, updated version of requests/api.py.

The -u flag to the diff command gives a unified diff format, which can be used to patch files later with the patch command.

So this method could be used in an automated process. Just make sure that you have specified a exact version numbers for the module that you patch. You don't want the module to upgrade unexpectedly, because you might have to update the patch file. So you also have to keep in mind, that if you ever manually upgrade the module, that you also check if the patch file needs to be recreated, and do so if it is necessary. It is only necessary when the file that you are patching has been updated in the new version of the package.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Tadhg McDonald-Jensen
Solution 3 Corey Goldberg
Solution 4 gitaarik