Basic Plotting in Python

plottest1 plottest2


This is an example of how to make a simple plot in python, using data stored in a .csv file.
We will:

  •   Load the 2 columns of data from the file into a (numpy) array
  •   Plot the data with pyplot.plot
  •   Tweak some plot settings to make it pretty
  •   Save the plot to a file, view the plot in a window, or both

You will need to have installed on your machine:

  • python (I’m running 2.7 in this tutorial)
  • numpy (I’m using 1.5.1 here)
  • matplotlib (I’m using 1.1.0 here)

You can run this entire script (see the condensed version at the end of this post) from the command line by setting it to executable and typing:
./plottingexample.py
or, you can run the whole script in ipython (type ipython at the command line) by typing
execfile(‘plottingexample.py’)
or, you can run each command individually in ipython

This script is not optimized for resources or speed – it is simply meant to be an easy-to-follow introduction to simple plotting.  If you haven’t seen it already, the official Matplotlib example gallery is an invaluable resource for plotting in python.  I would also refer the reader to the overview of plotting with Matplotlib by Nicolas P. Rougier here.


First, we will need to load some python modules…
– numpy is the numerical module that allows you to do Matlab- and IDL-esque numerical operations
– pyplot (part of matplotlib) is the plotting library

import numpy
from matplotlib import pyplot as plt

Here, we will load the data into python.  The example csv file is separated by commas, and strings (the titles) are in double-quotes.  This particular file has headers in the first row.  There are several ways to load a csv file into python; this is simply one easy way.

Normally, we might use numpy.loadtxt to do this, but our file has a funny format: the + sign in the scientific notation.  numpy.genfromtxt is a more robust version of loadtxt which can interpret this correctly and deal with missing cells.  This creates a 2D array of the data columns.

data=numpy.genfromtxt('SampleData.csv',skiprows=1,delimiter=',')

If we really wanted to be pedantic with our separation, we could create separate arrays for each column:

luminosity=data[:,1]
mass=data[:,0]

We can explicitly make a log(luminosity) array with numpy: (note that numpy.log is the natural log)

logL=numpy.log10(luminosity)

Plot the data values.  Syntax is pyplot.plot(xarray,yarray, other kwargs).  There is also a plt.scatter command, but we can just set the linewidth to 0.  Note that we can do computations on the fly within pyplot!

plt.plot(numpy.log10(luminosity),mass,'*',linewidth=0)

And now to set the labels. You can use Latex inline equation syntax.

plt.title('M vs. L')
plt.xlabel('log(Luminosity) in some units...')
plt.ylabel(r'log(M$_{H_{2}}$) in some units...')

Plot simply creates a plot object.  To view it, we need to either show or savefig.  Uncomment the one you want.  Note that pyplot will recognize the filetype (png,eps,pdf, etc…) you give it in savefig and save accordingly!

plt.savefig('plottest1.png',dpi=100)

Then clear for the next example:

plt.clf()

…And this is what we get:
plottest1



==== SECOND EXAMPLE ====

Now let’s make two subplots:
a.) Large red half-transparent squares with blue edges of all the data points
b.) Thin green diamonds for real points, downward pointing arrows for the upper limits
We first need to create a figure, then add the subplots

For this one, we’ll use the second csv file which has a column describing whether a mass value is an upper limit
–> genfromtxt by default expects floats, but will read strings if we set the expected datatype (dtype) to None

mass=numpy.genfromtxt('SampleData2.csv',skiprows=1,delimiter=',',usecols=0)
logL=numpy.log10(numpy.genfromtxt('SampleData2.csv',skiprows=1,delimiter=',',usecols=2))
massmask=numpy.genfromtxt('SampleData2.csv',skiprows=1,delimiter=',',usecols=1,dtype=None)

An aside: The slickest way to use masks is with a numpy masked array.  See my tutorial on NaNs and masked arrays here.  For example:

realmask=numpy.ma.masked_where(massmask=='"Yes"',massmask) # Remember our mask column is '"Yes"' and '"No"'
mass_limits=numpy.ma.masked_where(realmask.mask==False,mass) #This masks (removes) values where 'Limit'='"No"'
mass_regular=numpy.ma.masked_where(realmask.mask==True,mass)
lum_limits=numpy.ma.masked_where(realmask.mask==False,logL)
lum_regular=numpy.ma.masked_where(realmask.mask==True,logL)

If you want only the unmasked values, call with mass_limits.compressed(), etc… [end aside] Here we create separate lists for mass & luminosity of the upper limit values only and regular values only, using a standard for loop.  This method requires extra steps below.

limits=[]; regular=[]
for i in range(0,len(mass)):
    if massmask[i]=='"Yes"':  limits.append([mass[i],logL[i]])
    else: regular.append([mass[i],logL[i]])

Convert the lists to numpy arrays and transpose to put the data in order for plotting

limits=numpy.array(limits).transpose()
regular=numpy.array(regular).transpose()


Plotting.
Create the figure to which you will add subplots:

fig1=plt.figure(1)

Add the first subplot

sub1=fig1.add_subplot(2,1,1)
# 2,1,1 means 2 rows, 1 column, 1st plot

Now use the plotting commands just like before.

plt.plot(logL,mass,linewidth=0,marker='s',markersize=9,markerfacecolor='r',alpha=.5,markeredgecolor='b')
plt.title('Subplot 1 - Everything Together',fontsize=12)
plt.xlabel('log(Luminosity) ',fontsize=12)
plt.ylabel(r'log(M$_{H_{2}}$) ',fontsize=12)

Make the second subplot

sub2=fig1.add_subplot(2,1,2)
# 2 rows, 1 column, 2nd plot

Plot the upper limits with pyplot.errorbar(xlocs,ylocs,xerr_length,yerr_length, other kwargs)
fmt=None means no marker for the point, lolims turns the bars into upper limit arrows, capsize=arrowhead, elinewidth=the width of the arrow shaft, mew=markeredgewidth= width of the bar

plt.errorbar(limits[1],limits[0],xerr=None,yerr=.15,fmt=None,ecolor='k', \
    lolims=True,capsize=3,elinewidth=1,mew=0,label='Upper Lims')

Then plot as usual for the regular points

plt.plot(regular[1],regular[0],linewidth=0,marker='d',markerfacecolor='g',alpha=.5,label='Exact Mass')
plt.title('Subplot 2 - Separated',fontsize=12)
plt.xlabel('log(Luminosity) ',fontsize=12)
plt.ylabel(r'log(M$_{H_{2}}$) ',fontsize=12)

Change the limits, for kicks

plt.xlim(18,23)
plt.ylim(6,10)

Let’s add a legend, using the labels we defined in our plot calls. Properties defined in a dictionary.

plt.legend(loc='upper left',prop={'size':8},numpoints=1) #More args: ncol=1,loc=1,fancybox=True ...

Add a super-title

fig1.suptitle('Mass vs. Luminosity',fontsize=16)

Tidy up the subplot spacing – numbers are fractions of subplot sizes.

plt.subplots_adjust(left=.2,right=.8,bottom=.15,top=.85,hspace=.5)

Make the tick labels smaller. This can also control putting ticks on top, etc…  You can also use plt.ticklabel_format() to change the notation style

plt.tick_params(axis='both',labelsize=10)

Change the font of all the text and labels to standard serif

plt.rc('font',family='serif')

Finally, show or save.

#plt.show()
plt.savefig('plottest2.png')

plt.clf()

Here is the resulting plot:

plottest2


To recap, here is a list of some of the commands I used to plot and a quick explanation of some of the most useful keyword arguments (with example values). See the links for the official man page that explains all the keywords! The general page is
http://matplotlib.org/api/pyplot_api.html#module-matplotlib.pyplot

pyplot.plot(x_array,y_array,*kwargs)

  • marker=’s’ –> The marker to use for data points.
  • color=’red’ –> The color to use for plotting.  Accepts standard python color names, 0.0-1.0 for grayscale, or hex.
  • linewidth=0.5 –> The thickness of the line between points.  Set to 0 for no line.
  • markersize=10 –> The point size of the marker.  The shorthand is ms=…
  • markerfacecolor=’b’ –> The color of the marker’s body.  The shorthand is mfc=…
  • markeredgecolor=’0.5′ –> The color of the marker’s outline.  The shorthand is mec=…
  • alpha=0.5 –> Alpha sets the transparency level

pyplot.legend(*kwargs)

  • Note that there are some subtleties with plt.legend(), depending on if you call it as an object or not.  (i.e., leg=plt.legend()…) and if you created your plot instances as objects.
  • loc=3 –> The location where the legend is drawn
  • numpoints=1 –> The number of merkers to show in the legend
  • handlelength=0.5 –> The length of line to draw (in points) in the legend
  • ncol=4 –> The number of columns to use in the legend
  • fancybox=True –> Use a box with rounded edges
  • prop={‘size’:8} –> This sets the font properties, such as size

pyplot.title(‘MyTitle’,*kwargs)   Also see pyplot.suptitle()

  • loc=’left’ –> Justify in center, left, or right
  • x=0.2, y=0.95 –> Manually set x,y coordinates (in figure fraction)

pyplot.subplots_adjust(left=0,right=.5,top=0.1,bottom=0.05,wspace=0,hspace=0)

  • Left, right, top, bottom, wspace, hspace

pyplot.savefig(‘MyPlotTitle.pdf’,*kwargs)

  • Note that MyPlotTitle can be a full path, so you can save in any directory you like.
  • Also note that the pyplot figures out file savetype from the extension you put in the title (png, pdf, eps…)
  • dpi=300 –> The resolution of the output image in dots per inch
  • facecolor=’0.8′ –> Sets the color of the plot border
  • transparent=True –> Set the axes patch and figure patch to be transparent
  • bbox_inches=’tight –> Sets the the bounding box size. ‘Tight’ forces a crop to cut out whitespace.
  • pad_inches=0.1 –> Extra padding to use when bbox_inches=’tight’



==== EVERYTHING CONDENSED ====

Here is a condensed version of the commands.  You can save this as plottingexample.py and run from the command line.

Click here
#!/usr/bin/python

import numpy
import csv
from matplotlib import pyplot as plt

data=numpy.genfromtxt('SampleData.csv',skiprows=1,delimiter=',')
luminosity=data[:,1]; logL=numpy.log10(luminosity)
mass=data[:,0]
plt.plot(numpy.log10(luminosity),mass,'*',linewidth=0)
plt.title('M vs. L')
plt.xlabel('log(Luminosity) in some units...')
plt.ylabel(r'log(M$_{H_{2}}$) in some units...')
plt.show()
#plt.savefig('plottest1.png')
plt.clf()

mass=numpy.genfromtxt('SampleData2.csv',skiprows=1,delimiter=',',usecols=0)
logL=numpy.log10(numpy.genfromtxt('SampleData2.csv',skiprows=1,delimiter=',',usecols=2))
massmask=numpy.genfromtxt('SampleData2.csv',skiprows=1,delimiter=',',usecols=1,dtype=None)
limits=[]; regular=[]
for i in range(0,len(mass)):
    if massmask[i]=='"Yes"': limits.append([mass[i],logL[i]])
    else: regular.append([mass[i],logL[i]])

limits=numpy.array(limits).transpose()
regular=numpy.array(regular).transpose()

fig1=plt.figure(1)
sub1=fig1.add_subplot(2,1,1)
plt.plot(logL,mass,linewidth=0,marker='s',markersize=9,markerfacecolor='r',alpha=.5,markeredgecolor='b')
plt.title('Subplot 1 - Everything Together',fontsize=12)
plt.xlabel('log(Luminosity) ',fontsize=12)
plt.ylabel(r'log(M$_{H_{2}}$) ',fontsize=12)

sub2=fig1.add_subplot(2,1,2)
plt.errorbar(limits[1],limits[0],xerr=None,yerr=.15,fmt=None,ecolor='k',lolims=True,capsize=3,elinewidth=1,mew=0,label='Upper Lims')
plt.plot(regular[1],regular[0],linewidth=0,marker='d',markerfacecolor='g',alpha=.5,label='Exact Mass')
plt.title('Subplot 2 - Separated',fontsize=12)
plt.xlabel('log(Luminosity) ',fontsize=12)
plt.ylabel(r'log(M$_{H_{2}}$) ',fontsize=12)

plt.xlim(18,23); plt.ylim(6,10)
plt.legend(loc='upper left',prop={'size':8},numpoints=1)
fig1.suptitle('Mass vs. Luminosity',fontsize=16)
plt.subplots_adjust(left=.2,right=.8,bottom=.15,top=.85,hspace=.5)
plt.tick_params(axis='both',labelsize=10)
plt.rc('font',family='serif')

plt.show()
#plt.savefig('plottest2.png')
plt.clf()



Sample csv files used in this tutorial

Click here for SampleData.csv
"log M(H2) [Msun]","Luminosity [W/Hz]"
8.02,3.34E+19
7.94,3.81E+19
7.41,4.00E+20
8,3.91E+19
7.27,3.56E+20
7.55,1.76E+19
7.46,2.86E+20
7.63,2.69E+19
6.96,4.94E+18
7.19,3.17E+20
7.35,1.23E+19
7.73,9.42E+19
7.62,
7.28,1.72E+19
7.78,1.35E+19
7.36,1.66E+19
7.4,9.88E+18
7.66,1.83E+19
8.21,4.57E+20
7.77,5.82E+19
7.78,3.70E+19
7.91,5.81E+20
7.48,1.20E+19
7.53,
6.89,1.61E+19
7.5,1.74E+20
7.06,3.96E+21
8.13,8.32E+20
7.98,2.01E+20
7.22,2.98E+20
7.2,1.71E+20
7.74,
7.39,2.17E+20
7.11,1.62E+19
7.45,1.00E+22
7.12,1.11E+19
7.39,9.38E+18
7.25,1.21E+19
8.72,5.92E+20
7.58,9.38E+18
8.28,2.55E+20
7.31,1.23E+20
7.76,7.83E+21
7.79,2.58E+19
7.94,2.63E+19
8.33,9.80E+19
7.78,9.16E+19
7.85,3.30E+19
7.72,7.45E+20
7.6,2.40E+19
7.6,1.33E+19
7.82,2.08E+19
7.92,2.68E+19
7.67,1.92E+19
7.57,1.14E+19
7.68,8.57E+19
7.64,4.02E+19
8.47,4.77E+20
7.9,2.38E+19
7.98,2.66E+20
8,4.94E+20
7.58,1.14E+19
7.62,6.45E+19
7.83,5.69E+19
7.49,1.53E+19
7.41,1.40E+21
7.39,1.55E+19
6.96,4.31E+18
8.79,1.89E+21
8.03,1.10E+21
8.53,1.81E+21
7.9,4.27E+19
7.98,3.38E+19
7.92,3.60E+19
8,5.97E+19
7.65,7.36E+19
7.81,1.62E+19
7.62,1.98E+19
8.32,1.13E+21
8.58,1.34E+21
7.8,2.43E+19
8.77,3.52E+19
7.75,6.98E+19
7.87,2.18E+20
7.89,3.08E+19
7.52,1.64E+19
7.79,6.59E+19
7.54,7.42E+19
7.87,2.67E+19
9.19,5.14E+21
7.48,2.31E+19
8.65,1.83E+21
7.68,9.61E+19
7.61,1.71E+19
7.44,7.32E+18
7.12,1.43E+19
7.52,3.68E+19
7.47,1.24E+19
8.33,3.68E+20
7.91,5.20E+19
Click here for SampleData2.csv
"log M(H2) [Msun]","Limit?","Luminosity [W/Hz]"
8.02,"Yes",33368520915374800000
7.94,"Yes",38084926326744200000
7.41,"Yes",400085178787008000000
8,"Yes",39093499194531000000
7.27,"No",356189034772789000000
7.55,"Yes",17607223238288400000
7.46,"Yes",286360603640652000000
7.63,"Yes",26903793690921700000
6.96,"Yes",4940972388221380000
7.19,"Yes",316985913740269000000
7.35,"Yes",12305895234497800000
7.73,"Yes",94207316869183400000
7.62,"Yes",
7.28,"Yes",17158987504017800000
7.78,"Yes",13459820262483600000
7.36,"No",16581684400708200000
7.4,"Yes",9876452632728350000
7.66,"Yes",18319504517883800000
8.21,"No",457109684170750000000
7.77,"Yes",58175985682106300000
7.78,"Yes",37032397255629900000
7.91,"Yes",580913054903842000000
7.48,"Yes",11957640390437600000
7.53,"Yes",
6.89,"Yes",16119768544855700000
7.5,"Yes",174471237022577000000
7.06,"Yes",3962390085961550000000
8.13,"No",831966530019003000000
7.98,"Yes",200540918322003000000
7.22,"No",298216099008093000000
7.2,"Yes",171430526010093000000
7.74,"Yes",
7.39,"No",217288431492744000000
7.11,"Yes",16244996383071900000
7.45,"Yes",10007558570290300000000
7.12,"Yes",11062127409777000000
7.39,"Yes",9376584789089050000
7.25,"Yes",12138109293741500000
8.72,"No",592494207532420000000
7.58,"Yes",9381812413953460000
8.28,"No",255214352066382000000
7.31,"No",123266901644388000000
7.76,"Yes",7834806106838980000000
7.79,"Yes",25784508237840400000
7.94,"Yes",26342837758954400000
8.33,"No",97954524158250200000
7.78,"Yes",91574782751217700000
7.85,"Yes",32966512351829900000
7.72,"Yes",744943872339070000000
7.6,"Yes",23984795989100900000
7.6,"Yes",13340264557425200000
7.82,"Yes",20838405136609500000
7.92,"Yes",26812741409761600000
7.67,"Yes",19167571244596500000
7.57,"Yes",11410282942283200000
7.68,"Yes",85709161169739700000
7.64,"Yes",40159401069740800000
8.47,"No",477030827216338000000
7.9,"Yes",23798511990555400000
7.98,"Yes",265559967454768000000
8,"Yes",493856772637804000000
7.58,"Yes",11432674116179900000
7.62,"Yes",64528933270989600000
7.83,"No",56854874842154900000
7.49,"Yes",15295148643743600000
7.41,"Yes",1396303374284150000000
7.39,"Yes",15549742541566500000
6.96,"Yes",4311608709239170000
8.79,"No",1891600184373810000000
8.03,"Yes",1096533532393990000000
8.53,"No",1808214825980810000000
7.9,"Yes",42676953645405700000
7.98,"Yes",33773625486922300000
7.92,"Yes",36021130653426900000
8,"No",59720274241014000000
7.65,"Yes",73563528146885700000
7.81,"Yes",16179737317458500000
7.62,"Yes",19779722768637700000
8.32,"No",1129890630260920000000
8.58,"No",1339581028953160000000
7.8,"Yes",24344112210928900000
8.77,"No",35223633965183700000
7.75,"Yes",69758692223945700000
7.87,"Yes",217710841075595000000
7.89,"Yes",30761806253667100000
7.52,"Yes",16390578158495200000
7.79,"Yes",65929145637791600000
7.54,"Yes",74233946864774700000
7.87,"Yes",26738071215568200000
9.19,"No",5137401048219190000000
7.48,"Yes",23142567658709700000
8.65,"No",1828953187691050000000
7.68,"Yes",96082657044091600000
7.61,"Yes",17061710594939600000
7.44,"Yes",7322727758242710000
7.12,"Yes",14288481341147900000
7.52,"No",36800735135954300000
7.47,"Yes",12386569692225800000
8.33,"No",368338671117939000000
7.91,"Yes",52012168051107600000

4 thoughts on “Basic Plotting in Python

  1. I think this is among the most significant info for me. And i’m glad reading your article. But wanna remark on few general things, The website style is perfect, the articles is really nice : D. Good job, cheers

    • Pyplot (instead of the bundled pylab) is probably the most common, but you can also plot directly with matplotlib – i.e., creating the frame explicitly… I don’t really know if there are any other true alternatives to pyplot/pylab though. The statefulness is rarely an issue for me, but you can get around this if after plotting and showing/saving you run pyplot.close()

Leave a Reply to spoonful Cancel reply

Your email address will not be published. Required fields are marked *