# Profiling

Warning: Don’t optimize before your code works.  All of these techniques will make your code more difficult for yourself and others to read and understand.  First things first.

The presentation slides can be found HERE.  Contributed by Mike Stock.

The code we will use in this lesson can be found here:   smooth.py   cluster.py   cluster_solution.py

This session will cover several methods of timing and profiling your python code to find bottlenecks and time hogs:

• Timing functions with the built-in time module (good)
• Timing functions with timeit in ipython (better)
• Profiling execution time with line_profiler (best)

• python
• numpy
• ipython
• line_profiler (install with easy_install or pip)

### Example 1 :  time (python built-in)

Take the following example code, which generates an array of numbers, then smooths them.


import numpy as np
import time

def smooth1( ary,M ):

for i in range(M):
ary[1:-1] += ary[:-2] + ary[2:]
ary       /= 3
ary[0]    = 2*ary[0] + ary[1]/3
ary[-1]   = 2*ary[-1] + ary[-2]/3

def smooth2( ary, M):

for i in range(M):
ary += np.append( ary[0] ,ary[:-1]) + \
np.append( ary[1:],ary[-1] )
ary /= 3

if __name__ == '__main__':
#build an input array
ary = np.random.random(1000)
M = 16

smoothedAry1 = smooth1( ary.copy(), M)
smoothedAry2 = smooth2( ary.copy(), M)



Now, to determine how much time these definitions (functions) take to run, we will import time, and determine the instantaneous time with time.time() :


def smooth1( ary,M ):
#done in place
startTime = time.time()
for i in range(M):
ary[1:-1] += ary[:-2] + ary[2:]
ary       /= 3
ary[0]    = 2*ary[0] + ary[1]/3
ary[-1]   = 2*ary[-1] + ary[-2]/3
print time.time() - startTime

def smooth2( ary, M):
#done in place
startTime = time.time()
for i in range(M):
ary += np.append( ary[0] ,ary[:-1]) + np.append( ary[1:],ary[-1] )
ary /= 3
print time.time() - startTime

if __name__ == '__main__':
#build an input array
ary = np.random.random(1000)
M = 16

smoothedAry1 = smooth1( ary.copy(), M)
smoothedAry2 = smooth2( ary.copy(), M)



### Example 2 :  timeit (using ipython)

No need to insert any new code in your script; simply invoke timeit from the ipython terminal on a function.  To use, for example, on the smooth1 and smooth2 functions of our previous script smooth.py:


%timeit( smooth.smooth1(x.copy(),M) )

%timeit( smooth.smooth2(x.copy(),M) )



### Example 3 :  cProfile (python built-in)

cProfile is a basic built-in utility that will tell you some things about the various parts of your code, such as number of calls, cumulative execution time, etc…  Once again using our example script smooth.py, and piping the output to the terminal (with less):


$python -m cProfile -s cumulative smooth.py | less  Your output will look something like this: 0.000339031219482 0.000520944595337 11182 function calls (10993 primitive calls) in 0.049 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.050 0.050 smooth.py:2(<module>) 1 0.001 0.001 0.049 0.049 __init__.py:106(<module>) 1 0.000 0.000 0.023 0.023 add_newdocs.py:9(<module>) 1 0.001 0.001 0.019 0.019 __init__.py:15(<module>) 2 0.001 0.001 0.019 0.009 __init__.py:1(<module>) 2 0.003 0.001 0.013 0.006 __init__.py:2(<module>) 1 0.000 0.000 0.013 0.013 type_check.py:3(<module>) 262 0.004 0.000 0.005 0.000 function_base.py:3181(add_newdoc) 1 0.000 0.000 0.004 0.004 __init__.py:6(<module>) 1 0.001 0.001 0.004 0.004 __init__.py:45(<module>) 1 0.003 0.003 0.003 0.003 polynomial.py:48(<module>) 1 0.003 0.003 0.003 0.003 chebyshev.py:78(<module>)  This is nice because it’s built in to python, and will give some basic info, but it’s not as useful as we may want it to be. So let’s move on to a better utility: line_profiler. ### Example 4 : line_profiler (from the terminal) line_profiler – and kernprof.py, the script that runs it – gives us some better insight into the various bottlenecks and places to focus in our code. To run it, you must first add ‘decorators’ to your code to tell line_profiler which functions to analyze. Basically, add ‘@profile‘ before any function definitions of interest, for example:  @profile def smooth1( ary,M ): blah blah blah  Next, to perform the actual profiling, we start by running kernprof.py on our script, which will create a .lprof file (whose name we specify), which we can then profile with line_profiler like so: $ kernprof.py -l smooth.py
\$ python -m line_profiler smooth.py.lprof | less



Here is a copy of smooth.py with time.time calls and @profile decorators added, for your archival purposes.


#!/usr/bin/python
import numpy as np
import time

@profile
def smooth1( ary,M ):
#done in place
startTime = time.time()
for i in range(M):
ary[1:-1] += ary[:-2] + ary[2:]
ary       /= 3
ary[0]    = 2*ary[0] + ary[1]/3
ary[-1]   = 2*ary[-1] + ary[-2]/3
print time.time() - startTime

@profile
def smooth2( ary, M):
#done in place
startTime = time.time()
for i in range(M):
ary += np.append( ary[0] ,ary[:-1]) + \
np.append( ary[1:],ary[-1] )
ary /= 3
print time.time() - startTime

if __name__ == '__main__':
#build an input array
ary = np.random.random(1000)
M = 16

smoothedAry1 = smooth1( ary.copy(), M)
smoothedAry2 = smooth2( ary.copy(), M)



If you import a module, such as numpy, or in this case our smooth functions, that changes the __name__.  If you are just running commands on the main process, your __name__ is __main__.  For more info, see (e.g.) http://stackoverflow.com/questions/419163/what-does-if-name-main-do

Side note 2: it is possible to run line_profiler from within ipython.  Details to come…

### ***  Exercise: cluster.py ***

Now that you are familiar with the concepts of profiling, here is an exercise to work on.  The script cluster.py generates a random 2D array, then determines which regions whose values are related, or ‘clustered’.  cluster.py works as written, but is slow.  Your task is to run line_profiler on cluster.py, look for bottlenecks, and try to fix them.  cluster_solution.py is one solution with comments.  We will use cluster.py in future lessons.

cluster.py


#!/usr/bin/python

import numpy as np
from matplotlib import pyplot as plt
import time

def generate_ary(N,M):
"""This generates an array to cluster
In real use, you would use real data.
So, it is not important that this function is fast
"""
ary = np.random.random((N,N))

#make each element affect the ones around it
#we want to make the array less random so that there
#is something to cluster
for xx in range(N-M):
for yy in range(N-M):
if ary[xx:xx+M,yy:yy+M].mean() > 0.5:
ary[xx:xx+M,yy:yy+M] = ary[xx:xx+M,yy:yy+M]**.5
else:
ary[xx:xx+M,yy:yy+M] = ary[xx:xx+M,yy:yy+M]**2

return ary

def distance(a,b):
"""A simple distance calculation
"""
return np.sqrt( (a[0]-b[0])**2 + (a[1]-b[1])**2 )

def sort_by_cclength(a,b):
"""a function for sorting possible clusters
"""
return cmp( len(a[1]), len(b[1]) )

def find_events( ary, T):
events = []
for xx in range(ary.shape[0]):
for yy in range(ary.shape[1]):
if ary[xx,yy] >= T:
events.append( (xx,yy) )
return events

def find_close_events( ary, T, D, event):
closeEvents = []
for xx in range(ary.shape[0]):
for yy in range(ary.shape[1]):
if ary[xx,yy] < T:
#skip on
continue
#calculate the distance
d = distance( event, (xx,yy) )
#save it if it's close (skip the same event)
if d < D and d > 0:
closeEvents.append( (d,ary[xx,yy],(xx,yy)) )
return closeEvents

def sort_close_events_cmp( closeA, closeB ):
return cmp( (closeA[0], 1./closeA[1]),
(closeB[0], 1./closeB[1]) )

def sort_seeds_cmp( seedA, seedB ):
return cmp( seedA[1], seedB[1] )

def cluster( ary, D, T=0.5):
"""The main cluster function
ary is the array to cluster
D is the distance to search
T is the threshold

output is the return
"""

#first we go through the list and find the locations of all
#the events over the threshold

events = find_events(ary,T)
seeds = []

#now we loop through the events
for event in events:
#find all the events close to this event
closeEvents = find_close_events(ary, T, D, event)

#skip singletons
if len(closeEvents) == 0:
continue

#now that we have the events, we want to sort them
#the closest, largest amplitude event should be at the
#top of the list
#this requires a custom comparison function
closeEvents.sort(sort_close_events_cmp)

#now we make a seed event
#the last element is the parent array
seed = [event, len(closeEvents), closeEvents]
seeds.append( seed )

#now we have to sort the seeds,
#we want the one with the least neighbors on top
seeds.sort( sort_seeds_cmp )

#initialize the output
output = np.zeros( ary.shape )
#now merge all the possible clusters
while len(seeds) > 0:
#print len(clusters), len(possible_clusters)
clusterFound = False
for seed in seeds:
for event in seed[2]:
#make sure this event is one of the closest
if event[0] > seed[2][0][0]:
break

if output[event[2]] != 0:
#we found where this possible cluster belongs
output[ seed[0] ] = output[event[2]]
seeds.remove(seed)
clusterFound = True
break

if not clusterFound:
#make a new cluster from the from the last possible cluster
#in the list (which happens to be the one remaining from the
#for loop)
output[ seed[0] ] = output.max()+1
seeds.remove(seed)

#returning the output here lets you know what's going on pretty well
return output

if __name__ == '__main__':
### Generate the array to cluster ###
N = 50 #this determines the size of the array
ary = generate_ary(N,2)
#ary = np.loadtxt('data.dat')    #for comparison it is useful to have the same data set

### Cluster! ###
startTime = time.time()
ary_cluster = cluster(ary,1.5,T=0.7)
print 'clustered %ix%i array into %i clusters in %f seconds'%(N,N,ary_cluster.max(),time.time()-startTime)

### Plotting the before and after ###
fig=plt.figure(1)
plt.imshow(ary,origin='lower',interpolation='nearest')
plt.title('Input')

plt.imshow(ary_cluster,origin='lower',interpolation='nearest')
plt.title('Clusters')

plt.show()



cluster_solution.py


#!/usr/bin/python

import numpy as np
from matplotlib import pyplot as plt
import time

def generate_ary(N,M):
"""This generates an array to cluster
In real use, you would use real data.
So, it is not important that this function is fast
"""
ary = np.random.random((N,N))

#make each element affect the ones around it
#we want to make the array less random so that there
#is something to cluster
for xx in range(N-M):
for yy in range(N-M):
if ary[xx:xx+M,yy:yy+M].mean() > 0.5:
ary[xx:xx+M,yy:yy+M] = ary[xx:xx+M,yy:yy+M]**.5
else:
ary[xx:xx+M,yy:yy+M] = ary[xx:xx+M,yy:yy+M]**2

return ary

def distance(a,b):
"""A simple distance calculation
"""
return np.sqrt( (a[0]-b[0])**2 + (a[1]-b[1])**2 )

def sort_by_cclength(a,b):
"""a function for sorting possible clusters
"""
return cmp( len(a[1]), len(b[1]) )

def find_events( ary, T):
events = []
for xx in range(ary.shape[0]):
for yy in range(ary.shape[1]):
if ary[xx,yy] >= T:
events.append( (xx,yy) )
return events

#@profile
def find_close_events( ary, T, D, event):
###Change###
#instead of looping over the entire array, loop over a sub section
#find the low and high limits of other elements which can be close
x_low = int( np.floor( event[0]-D ) )
if x_low < 0:
x_low = 0
x_high = int( np.ceil( event[0]+D ) +1 )
if x_high >= ary.shape[0]:
x_high = ary.shape[0]-1
y_low = int( np.floor( event[1]-D ) )
if y_low < 0:
y_low = 0
y_high = int( np.ceil( event[1]+D ) +1 )
if y_high >= ary.shape[1]:
y_high = ary.shape[1]-1
closeEvents = []
for xx in range(x_low,x_high):
for yy in range(y_low,y_high):
if ary[xx,yy] < T:
#skip on
continue
#calculate the distance
d = distance( event, (xx,yy) )
#save it if it's close (skip the same event)
if d < D and d > 0:
closeEvents.append( (d,ary[xx,yy],(xx,yy)) )
return closeEvents

def sort_close_events_cmp( closeA, closeB ):
return cmp( (closeA[0], 1./closeA[1]),
(closeB[0], 1./closeB[1]) )

def sort_seeds_cmp( seedA, seedB ):
return cmp( seedA[1], seedB[1] )

#@profile
def cluster( ary, D, T=0.5):
"""The main cluster function
ary is the array to cluster
D is the distance to search
T is the threshold

output is the return
"""

#first we go through the list and find the locations of all
#the events over the threshold

events = find_events(ary,T)
seeds = []

#now we loop through the events
for event in events:
#find all the events close to this event
closeEvents = find_close_events(ary, T, D, event)

#skip singletons
if len(closeEvents) == 0:
continue

#now that we have the events, we want to sort them
#the closest, largest amplitude event should be at the
#top of the list
#this requires a custom comparison function
closeEvents.sort(sort_close_events_cmp)

###Change###
#trucate closeEvents to have only the closest
iEvent = 0
for closeEvent in closeEvents:
if closeEvent[0] > closeEvents[0][0]:
break
iEvent += 1
closeEvents = closeEvents[:iEvent]

#now we make a seed event
#the last element is the parent array
seed = [event, len(closeEvents), closeEvents]
seeds.append( seed )

#now we have to sort the seeds,
#we want the one with the least neighbors on top
seeds.sort( sort_seeds_cmp )

#initialize the output
output = np.zeros( ary.shape )
#now merge all the possible clusters
while len(seeds) > 0:
#print len(clusters), len(possible_clusters)
clusterFound = False
for seed in seeds:
for event in seed[2]:
###Change###
# this is no longer needed, done with the trucate above
#make sure this event is one of the closest
#if event[0] > seed[2][0][0]:
#    break

if output[event[2]] != 0:
#we found where this possible cluster belongs
output[ seed[0] ] = output[event[2]]
seeds.remove(seed)
clusterFound = True
break

if not clusterFound:
#make a new cluster from the from the last possible cluster
#in the list (which happens to be the one remaining from the
#for loop)
output[ seed[0] ] = output.max()+1
seeds.remove(seed)

#returning the output here lets you know what's going on pretty well
return output

if __name__ == '__main__':
### Generate the array to cluster ###
N = 50 #this determines the size of the array
ary = generate_ary(N,2)
#ary = np.loadtxt('data.dat')    #for comparison it is useful to have the same data set

### Cluster! ###
startTime = time.time()
ary_cluster = cluster(ary,1.5,T=0.7)
print 'clustered %ix%i array into %i clusters in %f seconds'%(N,N,ary_cluster.max(),time.time()-startTime)

### Plotting the before and after ###
fig=plt.figure(1)