Python package development
There are lots of features that go into packaging up Python code for other users. How can we make a project that ships easily to users and takes advantage of our normal development tools? We’ll discuss:
- Poetry: for easily making and publishing a package
- Sphinx: for making documentation
- Readthedocs: free professional-looking documenation hosting and formatting
- PyCharm: the default Python IDE (you can get the professional version as a student)
- PyPI: Python pacakge index, where you store stuff that people can pip install.
Basics
Read the Poetry docs to install. It’s good documentation; you should skim the Installation and Basic Usage first. For zsh users, make sure Poetry got added to your Path inside zshrc. For bash people, it’s automatic.
Let’s make a new project with Poetry.
- Create a python project. (See Choosing a project name below this list.)
poetry new myproject
Change to this directory.
- Start the poetry virtual environment.
poetry shell
This will make a virtual environment that is like a fresh Python installation for us to be explicit about our package’s dependencies.
- Install new pacakges as needed.
poetry add python_package
The name python_pacakge would be something like numpy. Poetry will install the package to the virtual environment and add the package to the pyproject.toml file.
- Take a look at the pyproject.toml file. All your package settings are here. Any added pacakges appear automatically. We also have a file not for human consumption called poetry.lock which does all the work of building the exact environement we are using. It can be good to commit this poetry.lock to version control so someone interacting with our package knows the exact packages we were using.
- To update all pacakges added with poetry add, run poetry update. To install the current system and update the poetry.lock file, run poetry install.
Choosing a project name
A new poetry project ``myproject'' has a specific directory structure:
myproject
|-- pyproject.toml
|-- README.rst
|-- myproject
| |-- __init__.py
|-- tests
|-- __init__.py
|-- test_myproject.py
The project name is for the top level directory of the project. This is the name of the github repository and the PyPI project so it is invoked with pip install myprojectname
. It should be unique. Project names on PyPI should NOT use dashes (https://stackoverflow.com/questions/8350853/). Underscores are allowed but discouraged. The package or module name is the inner directory containing __init__.py. This is the code that will be invoked by the user as import my_project_name in code. It does not have to be unique. It can use underscores.Note that Poetry defaults to matching project and package names. This is also the Python style guideline (PEP 423).
Version control
Now we start tracking our new package on version control. Init a git repo in the project directory. Do this in your usual way. (e.g. hosting on github). The splash page for your package will be README.rst! Make it pretty.
PyCharm
Let’s use a modern IDE. Open up the project in PyCharm.
-
Get the location of the interpreter for this virtual environment for PyCharm. Run this command:
poetry run which python
-
Make the virtual environment default for PyCharm.
- Settings $\rightarrow$ Project $\rightarrow$ Python Interpreter
- Click the gear, and select add.
- Choose the option \textit{existing environment} and add the path to the poetry virtual environment. Apply changes.
Now PyCharm will complain when you try to use python code you haven’t added. PyCharm will also give you actions to import missing libraries. However, be sure to add the python libraries with poetry, not PyCharm.
Add documentation with Sphinx
Let’s set up Sphinx.
-
Start
mkdir docs poetry add sphinx cd docs sphinx-quickstart
The command line will prompt you with a few questions. Use the default settings, but enter any project-specific information as needed.
-
All Sphinx settings are in conf.py. The first setting to edit is the path. Uncomment the lines:
import os import sys sys.path.insert(0, os.path.abspath('.'))
and change the “.” to “..” to reflect the docs folder.
-
Make sure that Sphinx knows that the main file is index.rst by adding the lines
# Assign the master document master_doc = 'index'
to conf.py.
-
Test to see that your docs compile. Run the command
make html
inside the docs folder then open up index.html in your web browser.
-
Add docs to readthedocs.
- Go to readthedocs, login, then find and click import.
- Paste the link to the github repo and create.
Readthedocs will find the conf.py file and build the documentation.
- Check that github will let readthedocs know when the documentation is updated. Go to the project repo settings and confirm that the Webhooks tab includes readthedocs.
Note: Oddly, the default Poetry config section tool.poetry.dependencies that allows users of your package to avoid installing development tools like Sphinx is not supported by readthedocs (the listed packages won’t be used). There is an alternative, e.g.
sphinx = {version="^3.0.2", optional = true}
To add packages to Poetry as optional you can call poetry add sphinx –optional to autofill this format. In your .readthedocs.yaml file, you can make sure these packages are installed by adding the extra_name parameter to extra_requirements, e.g.:
python:
version: 3.7
install:
- method: pip
path: .
extra_requirements:
- docs
Publish the package on PyPI
This is as easy as poetry publish
! First, we’ll have to setup our PyPI whic we can do by following the Poetry documentation.
Additional tasks
Sphinx can automatically generate documentation for the modules, classes, and functions that have properly formatted docstrings. There are two main docstring styles: NumPy and Google. I use Google’s docstring format becaues it takes up less vertical space. The essential Sphinx extensions are autodoc (for automatically making docstrings into reStructuredText) and napoleon (for docstring formats). Both should be added to the Sphinx conf.py file as extensions,
extensions = ['sphinx.ext.autodoc','sphinx.ext.napoleon']
No installation by Poetry is necessary because both are part of the base installation of Sphinx.
A common point of troubleshooting is that the readthedocs servers do not have your desired library installed. You will need to go to Advanced Settings on readthedocs and make sure you select to use both
Install Project
Install your project inside a venv using setup.py install
Use system packages
Give the venv access to the global site packages dir
Technically, you may only need the second option to get e.g. numpy which readthedocs has installed on their servers for you. But if you want a more advanced option like sklearn that isn’t on the default servers, you’ll need to install the project. This means you need one more file at the top level of your project called .readthedocs.yaml which looks something like
version: 2
build:
image: latest
python:
version: 3.7
install:
- method: pip
path: .
extra_requirements:
- docs
sphinx:
configuration: docs/conf.py
This file makes sure that the setup.py command (the old package tool) interacts with the poetry configuration files correctly. Currently, there are some changes to python standards moving in poetry’s direction, but these are not implemented in readthedocs yet. Hence, this extra file.
Adding Jupyter notebooks to the docs
The key tool here is nbsphinx. This will need to be installed by poetry. Also, an ipython kernel and a jupyter reader will need to be installed for readthedocs to run the notebook (explicitely, poetry add ipykernel
and poetry add jupyter_client
. You can make these optional. You will also need the .readthedocs.yaml file so see the note at the end of the autodoc section.
Adding a LICENSE
- Create a file in docs called license.rst and give the file a header like
License
=======
...
- Inside the index.rst look for
.. toctree::
:maxdepth: 2
:caption: Contents:
license
where we have added license to link the license file to the main documentation page (the name of the link will reflect the headings/subheadings in the file license.rst).
Cython Development
I think the most effective cython tutorial is this cython documentation example. You’ll eventually be introduced to the very basic example:
from setuptools import setup
from Cython.Build import cythonize
setup(ext_modules=cythonize("rect.pyx"))
This example is good because it shows the essential features. However, if you have multiple c++ files that you want to compile together, you will need more. Eventually you’ll want to create more complicated objects to pass to cythonize/ext_modules. This will introduce you to distutils.
The main change to have cython code is to add a build.py file. This file uses the python library distutils to link all the c++ files and call cython. I have an example build.py on my Github. To get Poetry to use your build.py, you need to include {build = 'build.py'}
under the [tool.poetry]
section in your pyproject.toml.
For readthedocs, it seems that a wrapper around this build.py script is needed. A short setup.py script can be written to do this. You might want to look at additional tasks for some context on parts of this, but here’s an example:
# Wrapper over build.py for readthedocs
from distutils.core import setup
from build import build
global setup_kwargs
setup_kwargs = {}
build(setup_kwargs)
setup(**setup_kwargs)
Another issue to address with readthedocs is making sure autodoc works for cython code. This fix is courtesy of https://stackoverflow.com/questions/13238736.
Using Docker
The goal of this section is to build and distribute a C++ shared package called example_package inside a Python wheel using poetry and auditwheel. For this, I followed: https://github.com/riddell-stan/poetry-install-shared-lib-demo.
The wheel created using these instructions conforms to the manylinux2014 standard and should be usable on most Linux systems. This README also includes notes which may be of interest to developers seeking to understand how the auditwheel repair
command works. You’ll need to install docker (so we can use PyPA’s manylinux2014
build image).