Debian packaging for Python 2 and 3
Written by Barry Warsaw in technology on Wed 18 January 2012. Tags: python, debian,
Time for another installment of my ongoing mission to convert the world to Python 3! This time, a little Debian packaging-fu for modifying an existing Python 2 package to include support for Python 3 from the same source package.
Today, I added a python3-feedparser package to Ubuntu Precise. What's interesting about this is that, despite various reported problems, upstream feedparser 5.1 claims to support Python 3, via 2to3 conversion. And indeed it does (although the test suite does not).
Before today, Ubuntu had feedparser 5.0.1 in its archive, and while some work has been done to update the Debian package to 5.1, this has not been released. The uninteresting precursor to Python 3 packaging was to upgrade the Ubuntu version of the python-feedparser source package to 5.1. I'll spare you the boring details about missing data files in the upstream tarball, and other problems, since they don't really relate to the Python 3 effort.
The first step was to verify that feedparser 5.1 works with Python 3.2 in a virtualenv, and indeed it does. This is good news because it means that the setup.py does the right thing, which is always the best way to start supporting Python 3. I've found that it's much easier to build a solid Debian package if you have a solid setup.py in upstream to begin with.
Now, what I'd like to do is to give you a recipe for modifying your existing debian/ directory files to add Python 3 support to a package that already exists for Python 2. This is a little trickier for feedparser because it used an older debhelper standard, and carried some crufty old stuff in its rules file. My first step was to update this to debhelper compatibility level 8 and greatly simplify the debian/rules file. Here's what it might have looked like with just Python 2 support, so let's start there:
#!/usr/bin/make -f
export DH_VERBOSE=1
%:
dh $@ --with python2
override_dh_auto_clean:
dh_auto_clean
rm -rf build .*egg-info
override_dh_auto_test:
ifeq (,$(filter nocheck,$(DEB_BUILD_OPTIONS)))
cd feedparser && python ./feedparsertest.py
else
@echo "nocheck set, not running tests"
endif
override_dh_installdocs:
dh_installdocs -Xtests
This is all pretty standard stuff. dh_python2 is used (the --with python2 option to dh), and we just provide a couple of overrides for idiosyncrasies in the feedparser package. We clean a couple of extra things that aren't cleaned automatically, and we run the test suite in the slightly non-standard way that upstream requires. Also, we override the installation of a huge amount of test files that would otherwise get installed as documentation (they aren't docs).
So far so good. What do we have to do to add support for Python 3?
First, we need to make a few modifications to the debian/control file. The current convention with dh_python2 is to use an X-Python-Version header in the source package stanza, so we just need to add this header to the same stanza for Python 3:
X-Python3-Version: >= 3.2
This just says we support any Python 3 version from 3.2 onwards. You also need to add a few additional packages to the Build-Depends. In the feedparser case, I added the following build dependencies: python3, python3-chardet, python3-setuptools. Even though for Python 2 there are a couple of other build dependencies (e.g. python-libxml2 and python-utidylib) these aren't available for Python 3, but lucky for us, they are optional anyway.
Next, you need to add a new binary package stanza. There was already a python-feedparser binary package stanza for Python 2 support. In Debian, Python 3 is provided as a separate stack, meaning packages for Python 3 will always start with the python3- prefix. Thus, it is pretty easy to just copy the python-feedparser stanza and paste it to the bottom of debian/rules, changing the package name to python3-feedparser. You have to update the Depends line to use ${python3:Depends} and I updated the Recommends line to name python3-chardet, and that was about it. Here's what the new stanza looks like:
Package: python3-feedparser
Architecture: all
Depends: ${misc:Depends}, ${python3:Depends}
Recommends: python3-chardet
Description: Universal Feed Parser for Python
Python module for downloading and parsing syndicated feeds. It can
handle RSS 0.90, Netscape RSS 0.91, Userland RSS 0.91, RSS 0.92, RSS
0.93, RSS 0.94, RSS 1.0, RSS 2.0, Atom, and CDF feeds.
.
It provides the same API to all formats, and sanitizes URIs and HTML.
.
This is the Python 3 version of the package.
Again, so far so good. Now let's look at the debian/rules file.
The first thing to do is to add support for dh_python3, which is analogous to dh_python2, and is the only accepted helper for Python 3. The rules line then becomes:
%:
dh $@ --with python2,python3
Now, one problem with debhelper is that it doesn't have any built-in support for Python 3 like it does for Python 2. This means dh will not automatically build or install any Python 3 packages, so you have to do this manually. Eventually, this will be fixed, and fortunately with a solid setup.py file, you don't have to do to much, but it's something to be aware of. In the feedparser case, we need to add overrides for dh_auto_build and dh_auto_install. Here's what these rules look like:
override_dh_auto_build:
dh_auto_build
set -ex; for python in $(shell py3versions -r); do \
$$python setup.py build; \
done;
override_dh_auto_install:
dh_auto_install
set -ex; for python in $(shell py3versions -r); do \
$$python setup.py install --root=$(CURDIR)/debian/tmp --install-layout=deb; \
done;
cp feedparser/sgmllib3.py $(CURDIR)/debian/tmp/usr/lib/python3/dist-packages/feedparser_sgmllib3.py
Not too bad, eh? You'll notice that the first thing these rules do is call the standard dh_auto_build and dh_auto_install respectively. This preserves the Python 2 support. Then we just loop over all the available Python 3 versions, doing a fairly normal equivalent of setup.py install (split into a build step and an install step). The install rule looks a little odd, but should be familiar to Debian Python hackers. It just installs the package into the proper Debian locations, and will pretty much be the same for any Python 3 package you build.
The one odd bit is the last line in the override_dh_auto_install rule. This is there just to work around an peculiarity in the feedparser 5.1 upstream package, where it depends on sgmllib.py, but that is no longer in the Python standard library in Python 3. Upstream provides an already 2to3 converted version of it, and recommends you install the module as sgmllib.py somewhere on your Python 3 sys.path. Well, I don't like the namespace pollution that would cause, so I install the file as feedparser_sgmllib3.py and add a quilt patch to the package to try an import of that module if importing sgmllib fails (as it will on Python 3).
An aside: If you look in the debian/rules file for what I actually uploaded, you'll see some additional modifications to override_dh_auto_test. This just works around the upstream bug where some test suite data files were accidentally omitted from the release tarball. You can pretty much ignore those lines for the purposes of this article.
We're almost done. The last thing we need to do is make sure that debhelper installs the right files into the right binary packages. We want the python-feedparser binary package to include only the Python 2 files, and the python3-feedparser binary package to only include the Python 3 files. Keep in mind that when a source package builds only a single binary package (as was the case before I added Python 3 support), debhelper will include everything under the build directory's debian/tmp subdirectory in the single binary package. That's why you see things get installed into $(CURDIR)/debian/tmp. But when a source package builds multiple binary packages, as is now the case here, we have to tell debhelper which files go into which binary packages. We do this by adding two new files to the debian directory: python-feedparser.install and python3-feedparser.install.
Reading the manpage for dh_install will explain the reasons for this, and describe the format of the file contents. In our case, we're really lucky, because for Python 2, everything gets installed under usr/lib/python2.* and in Python 3, everything gets installed under usr/lib/python3 (relative to $(CURDIR)/debian/tmp). You'll notice a few things here. Because we could be building for multiple versions of Python 2, we have to wildcard the actual directory under usr/lib, e.g. it might be python2.6 or python2.7. But because we have PEP 3147 and PEP 3149 in Python 3.2, there's only one directory for all supported versions of Python 3, so we don't need to wildcard the subdirectory. Also, if you look at the actual .install files in the package, you'll see a few other trailing path components, so the actual contents of the files are:
usr/lib/python2.*/*-packages/*
and:
usr/lib/python3/*-packages/*
for the python-feedparser.install and python3-feedparser.install files respectively. The trailing bits just wildcard what on a Debian system will always be dist-packages, just for safety (cargo culting FTW!).
And that really is it! Of course, things could be a little more complicated if you have extension modules, but maybe not that much more so, and if the package you're adding Python 3 support to isn't setuptools -based, you may have more work to do even still. The feedparser package has a few other oddities that are really unrelated to adding Python 3 support, so I'm ignoring them here, but feel free to ask for additional details in the comments, in IRC, or in email.
Hopefully this gives you some insight into how to extend an existing Python 2 Debian package into including Python 3 support, given that your upstream already supports Python 3. Now, go forth and hack!
Addendum: my colleague Colin Watson just today packaged up Benjamin Peterson's very fine Python package called six. This is a nice package that provides some excellent Python 2 and 3 compatibility utilities. You may find this helpful if you're trying to support both Python 2 and Python 3 in a single code base, especially if you have to support back to Python 2.4 (poor you :). This will be available in Ubuntu Precise, although if you're submitting patches back upstream, you may have to convince the upstream author to accept the additional dependency. It's worth it to add a little more Python 3 love to the world.