Cython cheatsheet

Cython is a great tool to improve speed on computation heavy parts of a Python program, but the documentation is somewhat not very clear. Below are some recipies I used or problems I stumbled upon. Python 3.4 is used for the code examples.

Extension files

  • .pyx: file where the Cython code will be
  • .pyxd: declaration file for linking to C libraries (.so), exporting a .pyx content or declaring types for a .py file
  • .pyxbld: build files to compile Cython on the fly if not using a setup.py

Basic setup

The below example is dumb code, the purpose being to show how Python and Cython interact.

test.py
from ctest import square
import numpy as np

x = np.arange(10, dtype=float)
y = square(x)
print(x)
print(y)
ctest.pyx
import numpy as np
cimport numpy as np  # Cython imports special compile time information about numpy

def square(np.ndarray[np.double_t, ndim=1] x):
    cdef long size = len(x)
    cdef np.ndarray[np.double_t, ndim=1] result = np.empty(size, dtype=float)

    for i in range(size):
        result[i] = x[i] * x[i]
    return result

Compilation (described in details here)

$ cython -3 ctest.pyx
$ gcc -shared -pthread -fPIC -fwrapv -O2 -Wall -fno-strict-aliasing \
    -I/usr/include/python3.4 -o ctest.so ctest.c

Cython generates a Python C extension ctest.c from ctest.pyx which is then compiled into a shared library ctest.so than can be imported by Python.

Then just run:

$ python3 test.py
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
[  0.   1.   4.   9.  16.  25.  36.  49.  64.  81.]

Automatic compilation

I use pyximport to automatically compile the Python code.

test.py
import pyximport; 
pyximport.install()
from ctest import square
import numpy as np

x = np.arange(10, dtype=float)
y = square(x)
print(x)
print(y)

I usually put the pyximport code inside the library or projet root __init__.py, it will then be loaded automatically when importing the package.

A .pyxbld is used to add compiler options by using magic from pyximport with distutils. It is very useful and handy while developping a project. Unfortunately, the Cython documentation do not seem to include the .pyxbld mechanism explicitly.

# ctest.pyxbld
from distutils.extension import Extension
import numpy as np

def make_ext(modname, pyxfilename):
    return Extension(name=modname,
                     sources=[pyxfilename],
                     extra_compile_args=['-O3', '-march=native', '-std=c99'],
                     include_dirs=[np.get_include()])

# Use the following to check compiler arguments
def make_setup_args():
    return dict(script_args=["--verbose"])

For more compile arguments, see the distutils extension documentation.

Now running

$ python3 test.py

will automatically compile the ctest.pyx with the good compile options.

Options and checks

The following options are nice to use for speed:

  • cdivision: use C division instead of Python
  • boundscheck: check arrays boundaries
  • wraparound: allow negative indexes to fetch the end of an array

Another option if using Python3 is language_level.

More details on the Cython compilation page.

Those options can be passed at the compilation:

$ cython -3 -X  boundscheck=False,wraparound=False,cdivision=True ctest.pyx

A decorator, or the .pyx file header can also be used.

ctest.pyx
# cython: language_level=3, boundscheck=False, wraparound=False, cdivision=True
import numpy as np
cimport numpy as np  # Cython imports special compile time information about numpy

def square(np.ndarray[np.double_t, ndim=1] x):
    cdef long size = len(x)
    cdef np.ndarray[np.double_t, ndim=1] result = np.empty(size, dtype=float)

    for i in range(size):
        result[i] = x[i] * x[i]
    return result

Cython annotations

Cython can annotate the .pyx file using:

$ cython -3 -a ctest.pyx

This will generate a ctest.c and a ctest.html displaying in yellow the places where Cython was not able to optimize in C and still call Python. This may be due to lack of typing, external function calls, …

More details on the Cython cythonize page. As stated it is a good start to optimize on where to release the GIL with nogil.

Nogil

Cython can generate C code that do not use the GIL if no python object is used in the call. nogil option may be used inside the body of a function or for a whole function.

ctest.pyx
# cython: language_level=3, boundscheck=False, wraparound=False, cdivision=True
import numpy as np
cimport numpy as np  # Cython imports special compile time information about numpy

def square(np.ndarray[np.double_t, ndim=1] x):
    cdef long size = len(x)
    cdef np.ndarray[np.double_t, ndim=1] result = np.empty(size, dtype=float)

    with nogil:
        for i in range(size):
            result[i] = x[i] * x[i]
    return result

The nogil can also be applied to single functions, for example on an inline C function:

ctest.pyx
# cython: language_level=3, boundscheck=False, wraparound=False, cdivision=True
import numpy as np
cimport numpy as np  # Cython imports special compile time information about numpy

cdef inline double my_square(double x) nogil:  # Create an inline C function with nogil
    return x * x

def square(np.ndarray[np.double_t, ndim=1] x):
    cdef long size = len(x)
    cdef np.ndarray[np.double_t, ndim=1] result = np.empty(size, dtype=float)

    with nogil:
        for i in range(size):
            result[i] = my_square(x[i])
    return result

Memoryviews

Cython can make use of the Python buffer interface using memoryviews. As described in the page, the syntax is cleaner, moreover indexing and slicing seems more efficient that with numpy because it is translated directly to C code, while a numpy slice makes some use of Python objects (use cython -a to check).

ctest.pyx
# cython: language_level=3, boundscheck=False, wraparound=False, cdivision=True
import numpy as np
cimport numpy as np  # Cython imports special compile time information about numpy

cdef inline double my_square(double x) nogil:  # Create an inline C function with nogil
    return x * x

def square(double[:] x):
    cdef long size = len(x)
    cdef double[:] result = np.empty(size, dtype=float)

    with nogil:
        for i in range(size):
            result[i] = my_square(x[i])
    return np.asarray(result)

Notice the np.asarray at the end to give back a numpy array to Python. There should be no memory copy from the buffer to the numpy array.
Because numpy arrays and memoryviews share the same buffer interface, the calling Python code do not need any change.

Linking to C code

Using Cython, it is very easy to link to C code. This is usually done via a .pxd file including a C header.

Suppose we have the following csquare.c and header:

csquare.c
#include "csquare.h"

double my_csquare(const double x) {
    return x * x;
}
csquare.h
#ifndef csquare_h__
#define csquare_h__
 
extern double my_csquare(const double x);
 
#endif  // csquare_h__

Compile as a shared library:

$ gcc -shared -pthread -fPIC -O2 -Wall -std=c99 -o libcsquare.so csquare.c
ctest.pyx
# cython: language_level=3, boundscheck=False, wraparound=False, cdivision=True
import numpy as np
cimport numpy as np  # Cython imports special compile time information about numpy

cdef extern from "csquare.h":  # Usually declared in a .pxd file
    double my_csquare(double x) nogil

def square(double[:] x):
    cdef long size = len(x)
    cdef double[:] result = np.empty(size, dtype=float)

    with nogil:
        for i in range(size):
            result[i] = my_csquare(x[i])
    return np.asarray(result)

Upgrade the ctest.pyxbld to link the new library automatically:

ctest.pyxbld
from distutils.extension import Extension

def make_ext(modname, pyxfilename):
    return Extension(name=modname,
                     sources=[pyxfilename],
                     extra_compile_args=['-O3', '-march=native', '-std=c99'],
                     include_dirs=[np.get_include(), '/path/to/csquare.h'], 
                     extra_link_args=['-L/path/to/libcsquare.so', '-lcsquare'])

def make_setup_args():
    return dict(script_args=["--verbose"])

Running test.py will compile the ctest.pyx and automatically link to libcsquare.so:

$ python3 test.py
# ctest.pyx compilation ...
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
[ 0.   1.   4.   9.  16.  25.  36.  49.  64.  81.]

As a general rule, it is better to wrap the C library in a .pxd (a csquare.pxd here) file that can later be reused by other .pyx, like described on the Cython website.

Cython also has wrappers to directly include the libc:

from libc.math cimport NAN, sqrt, fmin, fmax, INFINITY  # And many more
from libc.stdlib cimport atoi

Other tools for linking to C code