Cython

Here are the slides for this week’s topic.  Contributed by Mike Stock.  This presentation follows in the spirit of the High Performance Python for Climate Science Short Course at the 2013 AMS Annual Meeting.  The official cython website can be found here.

 

What is it?

Simply put, Cython compiles python code into a C module.  Since variables are declared with static types (by you, the user), and the Python code is compiled into C, users can get dramatic speed-ups for operations at which C excels.  Many parts of numpy are written in Cython.  Examples of some types of code that will benefit: array computations, for-loops (especially nested) of calculations, numpy array operations.

The Wikipedia entry on Cython is useful for getting a basic idea of the ‘whats’ and ‘hows’ of this package.

 

The basic recipe is this:

  1. Create a .pyx file of the code you want to convert to cython
  2. Make your changes: define static types for your scalars, vectors/arrays, etc.
  3. Create/modify a .py setup script
  4. Compile the setup file in the terminal, then run your script.

Cython workflow. Credit: Wikipedia

 

Example: convert the simple script ‘reduce’ to cython.  We will make use of the timing utilities mentioned in our first seminar.  The original function we will use looks like this:

reduce.py

import numpy as np
import time

def reduce( arr ):
    output = 0
    N,M = arr.shape
    for i in range(N):
        for j in range(M):
            output += arr[i,j]**2

    return output

sTime = time.time()
s = reduce( arr )
print 'Execution time:',time.time()-sTime

#-->  % Execution time: 1.20357918739

 

Steps we will take to speed up the above code in this lesson:
=========================================

  • Phase 0: Simply convert the python function to cython without changing anything.  (Slight speed increase.)
  • Phase 1: Manually define some scalar types in C.  (Slight decrease in speed for our example.)
  • Phase 2: Manually define the arrays in numpy with C.  (Large increase in speed.)

 

Phase 0

Our first step is to create a new .pyx file with the code snippet we want to convert to cython. In other words, save the following as reduce_p0.pyx :


def reduce( arr ):
    output = 0
    N,M = arr.shape
    for i in range(N):
        for j in range(M):
            output += arr[i,j]**2

    return output

 

We will also need to create a setup file, which we will call reduce_setup0.py
Here is the example:


from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

import numpy

ext = Extension("reduce", ["reduce_p0.pyx"],
                include_dirs = [numpy.get_include()])

setup(ext_modules=[ext],
      cmdclass = {'build_ext': build_ext})

 

Compile the setup file in the terminal with ” python <setupfile>.py build_ext –inplace ”

~$ python reduce_setup0.py build_ext --inplace
~$

 

Now we need a script that times both versions. Let’s call it reduce_run.py :

import numpy as np
import time
from reduce import reduce as creduce

def reduce( arr ):
    output = 0
    N,M = arr.shape
    for i in range(N):
        for j in range(M):
            output += arr[i,j]**2

    return output

if __name__ == '__main__':

    #a random array of numbers from 0-1
    arr = np.random.random( [1000,1000] )

    sTime = time.time()
    s = reduce( arr )
    print 'Execution time:',time.time()-sTime

    sTime = time.time()
    s = creduce( arr )
    print 'Execution time:',time.time()-sTime

 

If we run it in a terminal, we get the times (which depend on your computer’s hardware).

   ~$ python reduce_run.py

--> % Execution time: 1.2728600502
    % Execution time: 1.17863798141

 

Phase 1

Let’s manually specify variables in our code as floats, doubles, ints, etc. with cdef.  When we do this, we need to change the .pyx file and re-compile the update file to include the changes.
I recommend for this example that you save new iterations as a separate files.  This current one is called reduce_p1.pyx here to keep things straight.

# This is reduce_p1.pyx
###########################

def reduce( arr ):
    cdef double output = 0          #Now manually defined as a double
    cdef int N, M, i, j             #Now manually defined as integers
    N,M = arr.shape
    for i in range(N):
        for j in range(M):
            output += arr[i,j]**2

    return output

###########################

 

# This is reduce_setup1.py
###########################

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

import numpy

ext = Extension("reduce", ["reduce_p1.pyx"], #Updated to reflect the new .pyx file
 include_dirs = [numpy.get_include()])

setup(ext_modules=[ext],
 cmdclass = {'build_ext': build_ext})

###########################

Again, compile the new setup file in the terminal and run the script that times the old & new functions.

   ~$ python reduce_setup1.py build_ext --inplace

   ~$ python reduce_run.py

--> % Execution time: 1.07475996017
    % Execution time: 1.55283212662

 

Phase 2

Use cimport to load numpy.  We will need to unpack the implied tuple ( N,M = arr.shape ) from the previous versions.
Again, I’m saving the new versions of the .pyx and the setup scripts as separate files: ***2.pyx and ***2.py


# This is reduce_p2.pyx
###########################

#import numpy as np
cimport numpy as np     #Importing numpy in C

#Now we pre-specify the array type in our function definition
def reduce(np.ndarray[np.float64_t, ndim=2]  arr ):
    cdef double output = 0
    cdef int N, M, i, j
    #need to unpack the implied python tuple
    N = arr.shape[0]
    M = arr.shape[1]
    for i in range(N):
        for j in range(M):
            output += arr[i,j]**2

    return output

###########################

 

# This is reduce_setup2.py
###########################
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

import numpy

ext = Extension("reduce", ["reduce_p2.pyx"], #Updated to reflect the new .pyx file
 include_dirs = [numpy.get_include()])

setup(ext_modules=[ext],
 cmdclass = {'build_ext': build_ext})
###########################

Again, compile the new setup file in the terminal and run the script that times the old & new functions.

   ~$ python reduce_setup2.py build_ext --inplace

   ~$ python reduce_run.py

--> % Execution time: 1.2256538868
    % Execution time: 0.00202012062073

A significant increase in speed for this simple example!

 

Diagnostics

You might wonder how you should know what to change in your scripts.  A general rule of thumb is that you should declare static types for scalars, vectors, and numpy types.  There is a tool which can help, as well – Cython has a built-in method to show you what C code is generated by each portion of python code.  This can help in finding trouble spots.  Running the tool from the command line on your .pyx code will generate an .html file.  The darker yellow the line, the more C code is generated.  Some of this is unavoidable – importing numpy and making function definitions will of course generate a lot of C code.  But some lines can be modified to improve speed quite a bit, for example when we implicitly made a tuple in line 7 of our original reduce_p0.pyx ( N,M = arr.shape ).  Below are examples of the .html outputs – notice the difference between reduce_p1 and reduce_p2 , where we fixed that tuple issue.

To generate the diagnostic .html files, call cython with the -a option on your .pyx code from the command line:


~$ cython -a reduce_p1.pyx
~$ cython -a reduce_p2.pyx
~$

For our example above, you will get the following results. Feel free to click around and familiarize yourself – clicking on each line of python code will open up the drop-down list of the corresponding C code that is generated.
reduce_p1.html

reduce_p2.html

 

Exercise for the reader

============================
This next example makes a Julia set (related to the Mandelbrot set).  The image below is the result of producing the Julia set in the script with c = -0.8 + 0.156i.

Julia.py

from numpy import empty

def julia_escape(x, y, n, cx, cy):
    """ Julia set escape time algorithm in real and complex components
    """
    for i in range(n):
        x, y = x**2 - y**2 + cx, 2*x*y + cy
        if x**2 + y**2 >= 4.0:
            break
    return i

def generate_julia(xs, ys, n, cx, cy):
    """ Generate a julia set
    """
    d = empty(shape=(len(ys), len(xs)))
    for j in range(len(ys)):
        for i in range(len(xs)):
            d[j,i] = julia_escape(xs[i], ys[j], n, cx, cy)
    return d

if __name__ == '__main__':

    import time
    from numpy import r_
    #from pylab import imshow, show, cm
    #all this stuff to import just for making the image
    from matplotlib.figure import Figure,SubplotParams
    from matplotlib import rc, cm
    from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
    rc('savefig',dpi=150)
    rc('axes', facecolor='k',edgecolor='w')

    x = r_[-2:2:600j]
    y = r_[-2:2:600j]
    ###
    #the costant to use for the julia set
    cx = -0.835        #real part
    cy = -0.2321    #imaginary part
    #some other numbers to try out!
    #cx =  0.285
    #cy =  0.010
    #cx = -0.400
    #cy =  0.600

    t = time.time()
    d = generate_julia(x, y, 100, cx, cy)
    print 'Execution time:', time.time() - t

    ###
    #Make an image and save it to disk
    fig = Figure( figsize=(5,5) )
    fig.subplotpars = SubplotParams(left=.15,right=.90,bottom=.075,top=.95,hspace=0.5)
    ax1 = fig.add_subplot(1,1,1)    #a single plot
    ax1.imshow(d, extent=[-2,2,-2,2], cmap=cm.gist_stern)
    #Making a black background with white ticks & labels, cuz it's pretty.
    ax1.yaxis.set_tick_params(labelcolor='w')
    ax1.yaxis.set_tick_params(color='w')
    ax1.xaxis.set_tick_params(labelcolor='w')
    ax1.xaxis.set_tick_params(color='w')

    outS = 'Julia.png'
    print 'writing to %s'%outS
    canvas = FigureCanvas(fig)
    canvas.print_figure(outS,facecolor='k',edgecolor='k')

Try to speed up Julia.py using what you’ve learned in this seminar!

Julia

 

Here is a set of possible solutions.  Try not to cheat…

Julia_p2.pyx

from numpy import empty
cimport numpy as np

#Declare types for x,y,n,cx,cy in the function definition
def julia_escape(np.float64_t x, np.float64_t y, int n, double cx, double cy):
    cdef int i
    for i in range(n):
        x,y = x**2 - y**2 + cx , 2*x*y + cy
        if x**2 + y**2 >= 4.0:
            break
    return i

#Declare types for n,cx,cy.
def generate_julia(np.ndarray[np.float64_t, ndim=1] xs, np.ndarray[np.float64_t, ndim=1] ys, int n, double cx, double cy):
    cdef int i, j
    cdef np.ndarray[np.float64_t,ndim=2] d
    d = empty(shape=(len(ys), len(xs)))
    for j in range(len(ys)):
        for i in range(len(xs)):
            d[j,i] = julia_escape(xs[i], ys[j], n, cx, cy)
    return d

Julia_setup2.py

from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

import numpy

ext = Extension("Julia", ["Julia_p2.pyx"],
                 include_dirs = [numpy.get_include()])

setup(ext_modules=[ext],
      cmdclass = {'build_ext': build_ext})

Julia_run.py

import numpy as np
from Julia import julia_escape as cjulia_escape
from Julia import generate_julia as cgenerate_julia

def julia_escape(x, y, n, cx, cy):
    """ Julia set escape time algorithm in real and complex components
    """
    for i in range(n):
        x, y = x**2 - y**2 + cx, 2*x*y + cy
        if x**2 + y**2 >= 4.0:
            break
    return i

def generate_julia(xs, ys, n, cx, cy):
    """ Generate a julia set
    """
    d = np.empty(shape=(len(ys), len(xs)))
    for j in range(len(ys)):
        for i in range(len(xs)):
            d[j,i] = julia_escape(xs[i], ys[j], n, cx, cy)
    return d

if __name__ == '__main__':

    import time
    from numpy import r_

    x = r_[-2:2:600j]
    y = r_[-2:2:600j]
    ###
    #the constants to use for the julia set
    cx = -0.835        #real part
    cy = -0.2321    #imaginary part
    #some other numbers to try out!
    #cx =  0.285
    #cy =  0.010
    #cx = -0.400
    #cy =  0.600

    ### Original Definitions ###
    t = time.time()
    d = generate_julia(x, y, 100, cx, cy)
    print 'Execution time:', time.time() - t

    ###
    #Make an image and save it to disk
    #from pylab import imshow, show, cm
    #all this stuff to import just for making the image
    from matplotlib.figure import Figure,SubplotParams
    from matplotlib import rc, cm
    from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
    rc('savefig',dpi=150)
    rc('axes', facecolor='k',edgecolor='w')

    fig = Figure( figsize=(5,5) )
    fig.subplotpars = SubplotParams(left=.15,right=.90,bottom=.075,top=.95,hspace=0.5)
    ax1 = fig.add_subplot(1,1,1)    #a single plot
    ax1.imshow(d, extent=[-2,2,-2,2], cmap=cm.gist_stern)
    #Making a black background with white ticks & labels, cuz it's pretty.
    ax1.yaxis.set_tick_params(labelcolor='w')
    ax1.yaxis.set_tick_params(color='w')
    ax1.xaxis.set_tick_params(labelcolor='w')
    ax1.xaxis.set_tick_params(color='w')

    outS = 'Julia.png'
    print 'writing to %s'%outS
    canvas = FigureCanvas(fig)
    canvas.print_figure(outS,facecolor='k',edgecolor='k')

    ### Cython Definitions ###
    t2 = time.time()
    d2 = cgenerate_julia(x, y, 100, cx, cy)
    timetot2 = time.time() - t2
    print 'Execution time:', timetot2

    #fig2 = Figure( figsize=(5,5) )
    #fig2.subplotpars = SubplotParams(left=.15,right=.90,bottom=.075,top=.95,hspace=0.5)
    #ax2 = fig2.add_subplot(1,1,1)    #a single plot
    #ax2.imshow(d2, extent=[-2,2,-2,2], cmap=cm.gist_stern)
    #text=ax2.text(.05,.05,'Run with Cython - Executed in %.5f sec'%(timetot2),
#              ha='left',va='bottom',transform=ax2.transAxes,size=10,bbox=None,color='w')
    ##Making a black background with white ticks & labels, cuz it's pretty.
    #ax2.yaxis.set_tick_params(labelcolor='w')
    #ax2.yaxis.set_tick_params(color='w')
    #ax2.xaxis.set_tick_params(labelcolor='w')
    #ax2.xaxis.set_tick_params(color='w')

    #outS2 = 'Julia2.png'
    #print 'writing to %s'%outS2
    #canvas2 = FigureCanvas(fig2)
    #canvas2.print_figure(outS2,facecolor='k',edgecolor='k')

Using this set of solutions, I was able to get a speedup from 9.60871 sec to 0.04286 sec. Not too shabby for such little work!

Leave a Reply

Your email address will not be published. Required fields are marked *