Multi-Color Images

This post will describe how to combine multiple images with different color schemes to get a final product that you could call by one of these names:
Multi-color images, 2-color images, 3-color images, 4-color images, N-color images

For scikit-image, see e.g.  is a nice page for picking colors & converting between hsv, rgb, etc.

For this tutorial, I’m using free .fits images hosted by the Chandra site:

Note that all the images for a given object on that site are already reprojected to the same WCS and pixel grid.  For making your own N-color images with your own data though, in general you will have to first reproject all your images to one common WCS/pixel grid.  See my tutorial on reprojection for some discussion of how to do this.

from matplotlib import pyplot as plt
import numpy as np
import as pyfits
from skimage import color

The standard multi-color image is called an RGB or 3-color image.  To make one in python, you simply stack three (grayscale) images into a single array – one each for the Red, Green, and Blue channels.  Let’s try making one for Kepler’s Supernova Remnant, using infrared (Spitzer MIPS 24$\mu$m) as the R channel, low-energy Chandra X-rays as the G channel, and high-energy Chandra X-rays as the B channel.  [Credit: NASA/ESA/JHU/R.Sankrit & W.Blair  ]

### Load in the data
kepler_hdr = pyfits.getheader('kepler_ir.fits') #All these images have the same header
kepler_ir=pyfits.getdata('kepler_ir.fits') #MIPS 24um
#kepler_opt=pyfits.getdata('kepler_opt.fits') #WFPC2 F658N 
kepler_XhiE=pyfits.getdata('kepler_xray_he.fits') #High-energy X-rays (4-6keV)
kepler_XloE=pyfits.getdata('kepler_xray_le.fits') #Low-energy X-rays (0.3-1.4keV)

###Directly making an RGB array to plot (no rescaling)
#     --OR--     #

###Same, but explicitly setting the slices piece by piece 
kepler_simpleRGB[:,:,0]=kepler_ir; kepler_simpleRGB[:,:,1]=kepler_XloE; kepler_simpleRGB[:,:,2]=kepler_XhiE

Standard RGB stacks generally look very nice when each of the R, G, and B frames represent more or less the true red, green, and blue bands in reality – such as the lovely SDSS 3-color images.  However, there is no need to force other bands in the spectrum (such as the UV or radio) to adhere to the same pure red, green, and blue false color.  Take for instance the example above (also seen in the left panel below) – I personally think this does not look aesthetically pleasing at all.  But beyond aesthetics, you may have a ‘scientifically’ valid reason to despair over this color scheme – some time ago, a team I was working with made 3-color images of many galaxies in our survey and they just looked a bit off.  In the words of our PI: “Stars aren’t supposed to be green”.   To remedy this, we can imbue each channel with a color of our own choosing instead of the standard pure R, G, or B.  (That being said, the standard RGB does work stunningly well in many cases.)  Here is an example of a simple tweak from the standard RGB to a slightly more pleasing Red-Yellow-Blue color scheme.


Steps to create your own multi-custom-color images:

  1. Convert each separate image from greyscale (original fits) to its own RGB
  2. Colorize the RGB grey image by converting to HSV colorspace, changing the hue (from grey to color) & saturation
  3. Convert each back to RGB color space – at this point the data values should be between 0 and 1
  4. Add the separate RGB sky images together to create one master RGB
  5. Re-normalize each of R,G,B to [0,1] scale, but NOT each individual band from 0 to 1 independently…
    –> Scale each band from 0 to the FRACTION (max value in that band / max overall value from all 3 bands)
    –> This is what what lets the colors remain that color after scaling is a good resource to consult for finding colors you might want to use.



Rescaling image levels

You will probably have to rescale the data in some fashion to draw out the features you want.  For example, maybe you need to show changes in relatively faint features AND changes in relatively bright features in a sensible way.  Typically scalings are log, sqrt, power-law (gamma), etc.  I like Min-Su Shin’s for the job:

skimage, which I use for other tasks, also has rescaling capabilities – a useful one is rescale_intensity()

It may take you some time to play around with levels and scaling functions to determine the best settings to use for each image.  I usually just open individual images in DS9 and fiddle with the settings there when I want to do a ‘by eye’ check.

One very simple and important way to tweak your scaling is to set your desired minimum and maximum image values.  For plotting a single 2D frame with imshow(), you could achieve this by setting vmin=… and vmax=…  Since we will be working with 3D [RGB] frames, we can instead re-scale each image’s data values beforehand using img_scale.<function>() or skimage.rescale_intensity().  To make a simple linear re-scaling between min_value and max_value, you could do either of these:

rescaled_image=img_scale.linear(original_image, scale_min=min_value, scale_max=max_value)
# OR
rescaled_image=rescale_intensity(original_image, in_range=(min_value,max_value))

Note that img_scale.linear() will rescale the image values linearly between -1 and 1, which is useful or even required for some other operations.  One example is if your data values are floats rather than integers – the colorizing process we are about to discuss needs values to be between -1 and 1 if they are floats.

You can also rescale by logarithm, square root, power law, sinh, and asinh easily with img_scale.log(), img_scale.sqrt(), img_scale.power(), img_scale.sinh(), and img_scale.asinh().  Here is a simple visual of how square root scaling can be used to draw out the fainter features a little more, without even resorting to tweaking the min/max values.

SQRT rescaling
from matplotlib import pyplot as plt
import numpy as np
import as pyfits
import img_scale
import pywcsgrid2
from mpl_toolkits.axes_grid1 import ImageGrid, axes_grid


data_sqrt_scale=img_scale.sqrt(kepler_ir, scale_min=0., scale_max=kepler_ir.max())


g = ImageGrid(fig1, 111,nrows_ncols=(1,2),direction="row",axes_pad=0.02,

for i,dat,label in zip(range(2),[kepler_ir,data_sqrt_scale],['Original','SQRT Scale']):
    ax.set_ticklabel_type("hms", "dms"); ax.set_xlabel(''); ax.set_ylabel(''); 
    ax.xaxis.set_tick_params(color='0.8'); ax.yaxis.set_tick_params(color='0.8') 
    label=ax.text(0.05,0.95,label,ha='left',va='top',color='0.8', transform=ax.transAxes,size=10,)
    cbar.set_label_text('Some Units',color='k',size=9);

plt.clf(); plt.close()


One very common rescaling is by percentiles: for example, 0.1%-99.9%, 5%-95%, etc.  This means that if you were to make an ordered list of pixel values from lowest to highest, the min value on new brightness scale would be the one that is 0.1% up the list while the max value would be the one 99.9% up the list (or 5% and 95% or whatever pair you use).  An easy way to accomplish this is to compute it using numpy.percentile().  Note that if your image has a lot of pixels set to 0.0, such as in blanked outer edges, you will want to get percentiles after calling numpy.unique() on the image because np.percentile([0,0,0,0,0,1,2,3,4,5],20) is still 0 instead of 1.  You could do this:


Then you could run the rescale command of your choice like above.

Once you have rescaled your images to your satisfaction, you can check how well they mix by combining them into one RGB array. Here is a snippet of code that rescales the kepler images in various (arbitrary) ways, then combines them and plots the resulting RGB image.
* Give the Kepler IR image square scaling, using data values from 0 to its global maximum
* Give the low-energy X-ray image linear scaling from the 0.5% level to its global max
* Give the high-energy X-ray image square root scaling from the 4.0-99.5 percentile range

import img_scale
im1=img_scale.squared(kepler_ir, scale_min=0., scale_max=kepler_ir.max())
im2=img_scale.linear(kepler_XloE, scale_min=np.percentile(np.unique(kepler_XloE),0.5), 
im3=img_scale.sqrt(kepler_XhiE, scale_min=np.percentile(np.unique(kepler_XhiE),4.), 

ax1=plt.subplot(121); ax1.set_title('Unscaled')
ax2=plt.subplot(122); ax2.set_title('Example (Arbitrary) Re-Scaling')
for ax in [ax1,ax2]: ax.set_xticklabels(''); ax.set_yticklabels('');




Colorizing grayscale images

I use the skimage package to do the heavy-lifting here.  Specifically:

from skimage import color
from skimage.exposure import rescale_intensity, equalize_adapthist, adjust_gamma,adjust_log,adjust_sigmoid

Once again, see the pages here and here for some examples of using skimage with images.

We can now add a ‘tint’ or imbue a grayscale image with a certain color.  We’ll do this by converting a single grayscale image into its RGB components, then rescaling levels if necessary, and ‘colorizing’ those RGB channels.  I will use HSV colorspace to do the actual colorizing.  I’ve chosen ‘brick’, ‘dandelion’, and ‘steel’ as my three colors to use.  Once again, Kepler’s SNR is the example.

#A little function to imbue or tint grayscale images 
def colorize(image, hue, saturation=1,v=1):
    ### Add color of the given hue to an RGB greyscale image.
    hsv = color.rgb2hsv(image)
    hsv[:, :, 2] *= v
    hsv[:, :, 1] = saturation
    hsv[:, :, 0] = hue
    return color.hsv2rgb(hsv)

## brick=rgb(204,20,20)=hsv(0,.9,.8)
## dandelion=rgb(230,220,100)=hsv(60/360,.4,1.)
## steel=rgb(86,122,255)=hsv(227./360,.66,1.)


kepler_opt_sqrt=img_scale.sqrt(kepler_opt, scale_min=0., scale_max=100*np.sqrt(kepler_opt.max()))


### Comparing the infrared image in original grayscale vs colorized 'brick' red
plt.subplot(121); plt.imshow(kepler_ir_greyRGB,origin='lower',interpolation='nearest'); 
plt.title('IR - Grayscale')
plt.subplot(122); plt.imshow(kepler_ir_brick,origin='lower',interpolation='nearest'); 
plt.title('IR - Colorized to "Brick"'); plt.clf()


### Display all three colorized images separately
plt.subplot(131); plt.imshow(kepler_ir_brick,origin='lower',interpolation='nearest'); 
plt.title('IR - "Brick"')
plt.subplot(132); plt.imshow(kepler_XloE_dandelion,origin='lower',interpolation='nearest'); 
plt.title('Low X-ray - "Dandelion"')
plt.subplot(133); plt.imshow(kepler_XhiE_steel,origin='lower',interpolation='nearest'); 
plt.title('High X-ray - "Steel"')




Now, the final step is to combine them into one single array.  Since each colorized frame is an RGB set, for a given pixel we can take the mean of the ‘brick’ R value, the ‘dandelion’ R value, and the ‘steel’ R value to get a final ‘overall’ R value.  Ditto for the overall B and G values.   Note that you will need to re-scale the brightness levels to the [0,1] scale overall, but NOT each band independently.  To get your desired hue to remain in each band – the RELATIVE RGB levels – the peak values should be scaled as [the max. value in that band] / [the max. value from all 3 bands].

### Combine the colorized frames by taking the mean of the R images, mean of G, and mean of B

### Make a list of the maximum data values in each of the R,G,B frames
RYB_maxints=tuple(np.nanmax(kepler_RYB[:,:,i]) for i in [0,1,2])

### Now re-scale each frame to the overall maximum so that the final RGB set has range [0,1]
for i in [0,1,2]: 
                        out_range=(0, kepler_RYB[:,:,i].max()/np.nanmax(RYB_maxints) ));



Comparing once again with the simple RGB:
Kepler RYB

from astropy.wcs import WCS

fig1 = plt.figure(1)
wcs = WCS(kepler_hdr)
ax1=fig1.add_subplot(121, projection=wcs)
ax1.set_title('Simple RGB')
ax2=fig1.add_subplot(122, projection=wcs)
for ax in [ax1,ax2]:
    rapars = ax.coords[0]; decpars = ax.coords[1]
    rapars.set_ticks(color='white'); decpars.set_ticks(color='white')
    rapars.set_ticks(number=6); decpars.set_ticks(number=6); 
    rapars.set_ticklabel(size=8); decpars.set_ticklabel(size=8); 
    rapars.display_minor_ticks(True); decpars.display_minor_ticks(True)
    rapars.set_separator(('$^\mathrm{H}$', "'", '"'))
plt.suptitle("Kepler's SNR",y=0.8,size=14)


It’s worth mentioning that one big plus about combining images in this way (instead of in GIMP/Photoshop/etc.) is that you can keep the WCS info for plotting.  That is, the coordinate positions of the pixels is retained, so you can have a coordinate grid.  Using this method outlined here you should be able to integrate with your favorite reprojection and plotting packages, such as pywcsgrid2, kapteyn, aplpy, standard imshow, etc.

It doesn’t really make sense to have a single ‘multi-color’ colorbar unfortunately, but if you need to include some quantitative information about the flux scales (such as demonstrating log scaling, etc.),  you could always just plot three separate colorbars next to each other – one for each hue.

You don’t need to stop at 3 RGB frames – you can combine 2 or 4 (or more, I guess) frames to make N-color images.  Just be careful you select hue combinations that allow you to still see the different bands’ emission.  I’ve been fond of a teal-purple-orange-green scheme lately.




I’ll be using a little convenience function now to rescale by percentiles:

def rescale_image_percentiles(imagein,perclo=2.,perchi=98.):
    #Rescale an image by percentiles: perclo and perchi
    plo, phi = np.percentile(imagein, (perclo, perchi))
    return rescale_intensity(imagein, in_range=(plo, phi)) #uses skimage.exposure.rescale_intensity



* Here is M51 in 3 optical bands.  (Again, freely available from the Chandra site.) I think the standard RGB looks great here, which is not surprising as the optical bands are actually R, G, and B.  Here I subdued the colors a little bit, and added in a fourth image of X-ray emission.  I colored the X-rays in lime –  the splash of color is a good way to draw the reader’s eyes and emphasize certain features.
M51 4-Color Image

import pywcsgrid2 

### M51, red-yellow-blue-lime ###
### ==> For this one, the muted colors don't pop very well, and looks quite drab.  Maybe it's because
#       the different colors largely overlap (i.e., my method is best suited for roughly independent emission)
#   HOWEVER, this can be made up for by stretching the contrast at the end - 0-99.5 percentile works fine here.
#   Histogram (adaptive) equalization also works well, may provide some finer/subtler enhancement 


#Scale image between -1 and 1 if data type is float...
m51_optR=img_scale.linear(m51_optR); m51_optG=img_scale.linear(m51_optG); 
m51_optB=img_scale.linear(m51_optB); m51_X=img_scale.linear(m51_X)*.8; 

#Convert to RGB grayscale

#Colorize the grayscale images

#Combine the colorized frames and do final rescaling
RYBL_maxints=tuple(np.nanmax(m51_RYBL[:,:,i]) for i in [0,1,2])
for i in [0,1,2]: 
                       out_range=(0, m51_RYBL[:,:,i].max()/np.max(RYBL_maxints) ));
for i in [0,1,2]: 
                       scale_max=np.percentile(np.unique(m51_RYBL[:,:,i]),95.) );

for ax in [ax1,ax2]:
    ax.set_xlabel(''); ax.set_ylabel(''); 

plt.suptitle('M51 - Adding a fourth image to RGB',y=0.96)
plt.clf(); plt.close()



* Here is an example of a standard RGB looking like it needs a little bit of a makeover.  This is M101 in the infrared, optical, and UV (and X-rays for a fourth image in right panel).  The different bands don’t correspond to true red, green, and blue light in reality, so it’s not surprising that they don’t blend together particularly well.  Here I’ve just made a simple change to more reserved red, yellow, and blue hues, with the additional X-ray image in pink – more for blending in rather than standing out like the lime above.   This isn’t perfect, but it’s a good improvement.

M101 Red-Yellow-Blue-Pink

### M101, red-yellow-blue-pink ###
#   --> slightly better than M51, but still a bit drab.  A teal-orange-purple-yellow 
#       scheme actually looks good though
#   gamma adjustment with gamma<1.0 (say, 0.5) really brings up the low-vals nicely
#   log runs the risk of vals going over 1.0 ...
#   adjust_sigmoid draws out faint features really well, but turns blacks to grey... 
#   Fixed by percentile scaling...

from skimage.exposure import rescale_intensity, adjust_gamma, adjust_log, adjust_sigmoid

## Note that you may need to adjust the header - the original has CRVAL etc. as strings 
#  instead of floats which doesn't play nice with the some plotting functions...


#Scale image between -1 and 1 if data type is float...

#Convert to RGB grayscale

#Colorize the frames

#Combine and do final scaling
RYBP_maxints=tuple(np.nanmax(m101_RYBP[:,:,i]) for i in [0,1,2])
for i in [0,1,2]: m101_RYBP[:,:,i]=rescale_intensity(m101_RYBP[:,:,i], 
                                    out_range=(0, m101_RYBP[:,:,i].max()/np.max(RYBP_maxints) ));
for i in [0,1,2]: m101_RYBP[:,:,i] = adjust_gamma(m101_RYBP[:,:,i], gamma=.5,gain=1)
for i in [0,1,2]: 
    m101_RYBP_inverse[:,:,i] = adjust_sigmoid(m101_RYBP_inverse[:,:,i], cutoff=0.001, gain=10, inv=True)

for ax in [ax1,ax2]:
    ax.set_xlabel(''); ax.set_ylabel(''); 

plt.clf(); plt.close()



* This is an interesting one.  It is often more appropriate to plot ‘reverse’ color scales such that the background is light, and areas with higher data values are represented by darker points.  For example, this saves ink when printing physical copies, and it may be easier on the eyes in some digital formats.  I don’t think I’ve seen anyone explore this before, but you could also do a multi-reverse-color image.  Building on the code from the previous example, here it is plotted with a white background:
M101 Inverse 4-Color Image

### --> Continuing from the previous M101 code...

for i in [0,1,2]: 
    m101_RYBP_inverse[:,:,i] = adjust_sigmoid(m101_RYBP_inverse[:,:,i], cutoff=0.001, gain=10, inv=True)

ax1=pywcsgrid2.subplot(121, wcs=m101_hdr2)
ax1.xaxis.set_tick_params(color='w'); ax1.yaxis.set_tick_params(color='w'); 
for spine in ax.spines.values(): 
    spine.set_edgecolor('w'); spine.set_linewidth(0.4); spine.set_visible(True);

ax2=pywcsgrid2.subplot(122, wcs=m101_hdr2)
for ax in [ax1,ax2]:
    ax.set_xlabel(''); ax.set_ylabel(''); 
ax2.set_title('Inverse Red-Yellow-Blue-Pink',size=10)

plt.suptitle('M101 - Inverse RGB',y=0.84)
plt.clf(); plt.close()


Convolution and Regridding

Changing Image Resolution (Convolving) and Regridding


First, a few resources.
* Here is the Kapteyn package documentation on reprojecting with the maputils module
* And cygrid’s jupyter notebook

NASA SkyView query:

Normally for example data sets I advertise the wonderful multiwavelength LITTLE THINGS images freely available on the NRAO website.  The LT maps are quite beautiful, but this time I will use a couple random maps pulled from NED instead that are smaller in size – a couple hundred kB instead of a few MB.
A couple of images I think are nice for this purpose:
* M101, Herschel SPIRE 250um (KINGFISH) from NED query [1.4MB], in MJy/sr.  Also, the M101 500um is only 426kB
* M101, VLA 21cm [391kB] * M100, Herschel PACS 100um from NED query, in Jy/pix [4MB…] * M83 MIPS 160um in MJy/sr from NED query [169kB]




When calculating fluxes or luminosities from astronomical images – to create an SED, for example – you will almost certainly need to perform some formatting on your images before you extract fluxes, perform image arithmetic, etc.  Here I will cover two common operations: image convolution and regridding.  These are sometimes wrapped together in developed software packages, but are actually separate concepts.  One very important consideration here involves a quick mention of some terminology.  In normal life, the word “resolution” in regards to images might be described as “the fineness of detail” it contains.  That is true, but it’s actually a slightly subtle point.  And to understand resolution, we will need to understand the concept of point spread functions/beams (see the post on observing) and we need to talk about what precisely we mean by the term ‘pixel’.



Fun fact: Wikipedia tells me that the first use of the word pixel, which simply comes from a shortening of ‘picture element’, was by JPL scientists in the 1960s.

One very important point to understand is that higher resolution does not necessarily equate to more pixels.  This is because ‘pixel’ can refer to two different concepts in astronomy.  One usage deals with image capture [hardware] – that is, the physical cells that capture light in a detector.  (They can also be lovingly referred to as photon buckets.)  Squeezing more (smaller) physical pixels into a given area on your detector will typically give you sharper resolution. Think megapixels in cameras.

The other meaning of pixel deals with image processing and image reconstruction [software].  Pixels in the sense of a digital image refer to the data grid that represents your image information.  You can take your favorite cat photo from the internet and try to scale it in Photoshop to twice the size – giving it more pixels – but you’re not adding any new information to the new pixels that wasn’t already there, so when you zoom in the overall picture will still look just as blurry as the original.  (You can of course apply some clever interpolation algorithms to give the pixels the appearance of being smoother, but that’s a different issue.)  You can combine separate images that are slightly offset to make a new image which has slightly finer resolution than any of the originals – the most common way to do this is called “drizzling“, and is used often in Hubble images, for example.  In radio astronomy (and sub-mm and some infrared instruments), the detector is not directly capturing photons as particles like a CCD camera does.  Usually it’s detecting light as a wave, and the image is reconstructed later from the raw voltages.  In practice, this means that a scientist could reconstruct an image with arbitrarily small digital pixels – just as you could scale your cat photo to an arbitrarily large size – but the extra pixels wouldn’t necessarily contain any new information.


To expound a little more, taking the example of consumer digital cameras, a sensor that boasts 20 Megapixels is regarded as higher resolution than one with, say 15 Megapixels.  This is only part of the story though, as it depends on things like the physical size of the pixels – it’s conceivable to have these two camera sensors made from the same chip of pixels, just cut to separate 15 and 20 Mp areas.  Just considering the sensors themselves here and ignoring the other camera optics, that pair of chips would have the same inherent ability to resolve detail but different total areas. But let’s assume now that they have the same overall area.  It makes sense that, say, a one inch square chip which contains 1000 pixels would produce a much lower resolution photo than a one inch square chip with 1 million pixels.
[Guess the image!] Guess the image
However, you can’t continue the trend forever – engineering issues aside, at some point the pixels would become smaller than the wavelength of the light it was trying to capture.  Beyond that limit, squishing in more pixels won’t help you.

“A gajillion pixels is overkill.” – PJC

Besides, you could open up your photo in some image processing software and divide each pixel into, say, four (or 9, 25, etc…) equal parts if you wanted to.  That would make your image have MANY more pixels.  But that wouldn’t be increasing your resolution at all, because those new smaller pixels are not adding any information.

Let’s reverse that notion – if you are now faced with the task of observing light of longer wavelengths, a single photon can more easily bleed over multiple pixels if they remain small.  Now an infinitesimally small point of light in the real world will register in several pixels, according to some pattern which depends on the optics.  This is the PSF or beam pattern mentioned in other posts.  (This description is vastly simplifying the whole process, of course – for starters, the detector technology of ‘pixels’ is quite different between optical, FIR, and radio telescopes.  Please consult a proper optics or observational astronomical text for a more complete picture.)

So, let’s back up a step.  In a basic sense, the sensor in your typical camera is recording the incoming light of whatever it’s pointing at.  However, if the camera is out of focus or there are defects in the optics, no matter how many pixels you have, the image will appear blurry.  And if your pixels are larger than the innate scale for detail afforded by your optics, your image will be ‘blurrier’ than if you were to employ smaller pixels.  To summarize, the final resolution of your image is determined by a complex interplay between your *entire* optics system, including lenses (possibly interferometer spacing) and recording hardware.

Let’s look at an example of increasing the number of pixels in an image.  Below on the left is an imaginary faraway source (top panel) observed with some telescope.  The resolution (beam FWHM) is shown by the dotted circle – in this case the true source would be a point source with regards to the telescope’s resolution.  The final image is in the bottom panel.  Note that obviously no fine detail about the source can be determined from this image such as if there are 2 or 3 separate objects contained, etc.  Now look at the image on the right side – this one has been produced by processing the raw signal onto a finer pixel grid.  (You can often do this with radio data, for example.)
Pixel Example 1Pixel Example 2

Do you gain anything by doing this?  Well, note that you’re still not able to determine any fine detail about the true source – on that count there is no improvement over coarser image on the left.  If you are trying to get a precise sum of the flux in a circular aperture around the source, though, this may help you out a bit as the smaller pixel sizes would fill the round aperture more fully than larger pixels.  The downside of having smaller pixels in this case would be that processing time could increase dramatically.


Now let’s move on to the discussion of how to compare images from different instruments/settings.  The tools we will use are convolution and regridding.  These can sometimes be related (such as when, in radio interferometer data reduction, you choose the pixel size so that many pixels cover the beam area), but remember they are separate concepts.



For images that are taken with different resolutions, you will need to convolve or “smooth” the sharper images before comparing fluxes.  Comparative photometry requires this because the poorer-resolution image flux may be underestimated compared to the sharper image.  That is, when observing some object with a given inherent luminosity and angular size, if you observe with low spatial resolution (big beam/PSF), then some of the observed flux from the object will be scattered out to larger radii.  If you observe the same object with much higher resolution, this effect is much less pronounced.  So, for photometry, if you want an ‘apples-to-apples’ comparison between different spectral bands, you need to convolve/smooth your images down to the lowest resolution (largest beam size) before extracting your fluxes.

To visualize what changes in a PSF size mean for an image, let’s consider the following image, also mentioned in the post about observational data.  I have taken a sample source image and applied Gaussian blurs of increasing PSF sizes.

Convolution Example 1
The leftmost image is an imaginary ‘astrophysical’ source, as it would appear at infinite resolution. Now, note how the image gets blurred out as the beam size increases in the images to the right. The 2D Gaussian PSF FWHM is denoted by the dotted circles in the lower right corners.  Already in the second panel, in which the PSF is relatively small, the details are difficult to make out. Where is the edge of the bulge in the galaxy? Are the spiral arms narrow or fat?  It’s hard to tell in this panel.  But very notably, the thin comet shape has almost completely disappeared.  This demonstrates the importance of what is called the beam filling factor.  If a source populates the beam too sparsely (small filling factor), even if the overall structure is around the spatial scale of the beam, the instrument will be hard pressed to recover a detection because the flux gets spread over the whole beam (and some outside, too).  In the third panel, even though the beam FWHM is still smaller than the ‘alien’ face or galaxy, it’s already impossible to determine the detail of either blob. (The comet is nowhere to be seen now.)  The two larger features now closely resemble 2D Gaussians, though the left one is obviously not a point source because the emission goes notably beyond the beam size. The fourth panel shows the Gaussian convolution with a kernel roughly the size of the whole collection of sources.  Now the whole thing looks roughly like one single 2D Gaussian beam.

Now note the colorbars in each of those panels – in these plots I chose to let the min/max values be independent between panels in order to focus just on the blurring or flux spreading aspect of the convolution.  But if we look at the exact same figure made with all the plots scaled to the same absolute min/max, something else becomes apparent.  See the following figure:
Convolution Example 2
Here you can clearly see the emission in each panel appear to get dimmer – this is precisely because the larger beam size is spreading the original flux over a larger area (both within the beam, and also to the pixels without). Keep this in mind when you are proposing for observing time on a telescope!

Convolution isn’t only used to match two image resolutions, though – it can also be used to reduce the noise in a single image, if you don’t mind sacrificing some resolution – this is often done in 3D data cubes along the spectral axis.  (c.f. Hanning smoothing, for example.)  Since the ‘blurring out’ of your image by convolving to a larger PSF will ‘smooth out’ your signal, the smaller-scale fluctuations you might see in your original image will average out to a greater degree as the PSF increases.  So that disadvantage of small-scale features ‘disappearing’ from earlier can be repurposed here as a beneficial effect.

In practice, image convolution can be done with a variety of tools.  In popular image processing software packages, you may encounter it as a “Gaussian blur”.  In python, common implementations are scipy.ndimage.convolve or astropy.convolve.  I recommend the latter.  Below we’ll go through examples of how to do this in python.




Why would you want to reorganize an image to a different pixel grid?  There are a number of reasons.  For example, maybe you want to perform some photometry using the same regions with identical pixel grids, to ensure consistency in your measurements.  Or, maybe you want to divide one image by another to make a pixel-by-pixel plot (making a color-color plot, for fitting functions, etc.).  Or maybe one of your images is rotated and you just want to compare them side-by-side.  Whatever the reason, reprojection to a different coordinate grid is relatively easy with the available packages in python.  However, it’s critical to conserve flux by keeping in mind the flux or brightness units of your image.

Let’s take a simple theoretical example pixel with a flux density of 10 mJy.  And let’s assume this is a single pixel from a map with units of [Jy/pixel].  Assuming uniform brightness across the entire pixel as is typically done, if you divide that pixel into four equal sub-pixels, each new one will NOT have a flux density of 10 mJy/pix, but will instead be decreased by its area fraction to 10/4 = 2.5 mJy/pix.  If your map is in units of brightness, however – say, 10 MJy/sr – and you regrid to the smaller pixel sizes, each new pixel still remains 10 MJy/sr.  That’s because brightness is a unit of flux per area – even though the flux is decreasing by say 1/4, the area is also decreasing by 1/4.   See below for another example:

Rendered by


Another technical point to note is that you should try to retain proper Nyquist sampling of your beam/PSF.  That is, you want to have at least two pixels across each beam, otherwise you are losing spatial information. That’s really on the low side, though.  To be safe, three to five pixels per beamwidth is common.  The penalty for increasing the number of pixels in this manner is increased computation time in your subsequent processing.  That being said, you could regrid such that each pixel is effectively one beamwidth across, making each pixel *independent* for the purpose of fitting functions in a spatially resolved way, etc.





Mona Lisa (Original)

Regridded Only

Guess the image







Convolved Only

Mona Lisa (Convolved)

Convolved AND Regridded

Mona Lisa (Convolved and Regridded)

Mona Lisa image credit: Wikimedia Commons



Coming soon!



—– Work in progress —–

Let’s use the Kepler’s SNR images as an example – comparing the IR and radio.  The VLA image has larger pixels and is slightly rotated compared to the IR.   First, to work with any data in maputils, you need to load it in using their tool called FITSimage(), so that it’s formatted correctly.  You can either give FITSimage() a path directly to the fits file, or if you’ve already loaded the data & header elsewhere in your code, you can use the externaldata and externalheader paramaters:

from kapteyn import maputils


from kapteyn import maputils
import as pyfits

Then of course you need to load the IR image:

# OR

Once the data are loaded, it’s really simple to reproject/regrid the IR image to the radio header – in kapteyn it’s just a one-line command called maputils.reproject_to()


The data array is actually stored in map_IR_reprojected.dat, so if you don’t need the extra info that reproject_to() includes, you can just do this right away when you call the function (this is also what I do):


Alternatively, use the reproject package like so:


OR, using the exact reprojection method (not necessary here, but you will probably want this for precise flux conservation when doing comparative photometry):


Now the IR map’s spatial data array follows exactly the radio map’s header (though of course the flux units haven’t been changed). It’s ready to plot or save as you normally do.

*** A couple things to note ***

* For kapteyn‘s reproject_to, the header ‘NAXIS’ card needs to be exactly 2 (that is, a 2D image, not a 3D or 4D data cube).  It’s best to convert it to 2D properly, but if your image is truly just 2D yet retains empty spectral and Stokes axes in the header, you can ‘cheat’ by just manually setting NAXIS to 2:

if map_radio.hdr['NAXIS']>2: map_radio.hdr['NAXIS']=2
if map_IR['NAXIS']>2: map_IR['NAXIS']=2

* Also, if you are planning to do any flux extraction on your reprojected/regridded image, you will need be very careful of the units.  If the units of the original data are Flux [W/m^2, Jy, or similar], then you will need to scale by a factor of  (the new pixel area)/(the old pixel area).  You don’t do this for images with units of brightness [W/m^2/sr, MJy/sr or similar].

# --> Jy_2 = Jy_1_reproj*([sr/pix_2]/[sr/pix_1])
#FIRST, find the degrees per pixel from the header
map_original_flux = pyfits.getdata('....')
map_original_brightness = map_original_flux / degperpix_old**2 
map_reprojected_brightness = map_original_brightness.reproject_to(newheader).dat
map_reprojected_flux = map_projected_brightness * degperpix_new**2


Introduction to Photometry (Emphasis on IR)


(Note, I’m primarily concerned with relatively ‘intermediate distance’ astronomical sources, such as low-redshift galaxies, supernova remnants, etc. instead of high-redshift galaxies and other point sources.  This discussion is therefore biased in that direction.  )

First, some resources.
PhotUtils is a python module for photometry that you may find useful:
I recently came across the APT GUI tool, and while I haven’t used it, it seems to be quite useful:
Here is some Herschel documentation for ap.photom. in HIPE:
Of course there are many papers that detail how the authors did their photometry and uncertainty estimation.  A few recommendations are the H-ATLAS data release paper (Valiante+2016) which I was involved with, and I liked Remy-Ruyer+13 for their careful treatment of the PACS PSF contribution on the background regions, etc.


Basics points to consider

  • Create an appropriate region – may need to check image with extreme scaling in DS9 to make sure you’re getting all the target features you want / avoiding those you don’t want
  • Convert to desired physical units
  • Estimate background – from nearby aperture (preferably not far away due to beam power pattern and inherent sky variability)
  • Apply correction factors: Aperture correction, Color correction, K4 correction (IR)
  • Uncertainty estimation (you can do a ‘formal’ or theoretical estimation, but an empirical approach will be more realistic)

A general ‘formula’ for photometry will look something like this:

Raw = sum(Fluxpix,i) over all pixels
BGsub = Raw – MedianBGreg
Corrected = BGsub·correction1·correction2…



The goal of photometry is to determine a certain target’s integrated emission in some wavelength band. Usually for the purpose of calculating flux ratios or constructing an SED.  To accurately and precisely determine the true source flux, you need a properly calibrated image and usually some knowledge of the instrument that obtained it, for various correction factors.


Region creation and flux extraction

The most obvious step is to sum up the individual pixel intensities in your target region of choice.  However, there are many things to consider in this step.

  1. Be sure to decide on an appropriate aperture (or ‘region’ in DS9 parlance) for your science goal.  Of course, if you are trying to target a small feature in a crowded map, you might make a little circle or ellipse close around that target to make sure you aren’t including any flux from nearby features.  But take care that your region isn’t too small – in particular it should always be larger than the limiting beam size.  Furthermore, if you are interested in a target with a complicated brightness profile – perhaps an entire galaxy with an exponential brightness decrease from the center – be sure to make your aperture large enough to include all the necessary emission.  One good way to ensure you are doing this (for your image, at least) is to play with the data scaling in DS9 to see what the very faint extended emission is doing, etc.
    –> I’ll try to bring up an image to demonstrate some good and problematic choices of apertures for various features of interest
  2. You should make sure you understand the units of the image.  If they are something like Jy/beam, you will need to know what the beam area is to convert to flux density, and if you want to convert to something like luminosity, you will also need to know how to properly convert from the “per Hz” – if your image is ν·Fν you can just use the band’s characteristic frequency, but if it’s from a collapsed data cube you will need to know the channel width, and for some instruments you may need to know the overall imager bandwidth, etc…
  3. If you intend to compare the integrated fluxes between images at different spatial resolutions, you will need to convolve the sharper images down to limiting beam size (sometimes called “smoothing” the images).  This ensures that your flux integrator is viewing the images in the same way – that no flux is being spread out by a PSF in a different way.  Basically all astronomy data reduction/analysis packages have tools for this: CASA has imsmooth(), HIPE has imageConvolution(), python has several excellent tools such as  the astropy.convolution package.
  4. If your region is relatively small, or alternatively if your pixels are relatively large, you may want to employ sub-pixel precision in your sum.  It just depends on the level of precision you need, but if changing your desired region by a small increment changes the resulting flux by a few percent or more, you are probably in the regime of caring about this.  One simple way to get good payoff for relatively little work is to divide each pixel into something like 4, 9, 16, or 25 etc. subpixels for the flux integration. (Make sure to properly conserve flux if you do this!)


Background estimation

  • The ‘standard practice’ is to determine the background level from an annular region just outside of target region, but this isn’t always possible.  If, for example, you have bright foreground/background stars in the annulus just around your target, you may wish to blank them out or use a slightly further-out annulus.  Or if there are other galaxies nearby in the field, or if the PSF throws a noticeable portion of the source flux into the nearby background region (more on that below), you may have to use non-annulus regions even further out.  Be sure not to use regions of sky that are obviously biased, such as a large negative or positive bowl – remember, you want to estimate the typical background.  Each situation is a little bit different, and you will need to tailor your method to the data available to obtain the most unbiased estimate of the background.
  • Once you have determined the best region to use for your background pixels, you will need to decide on an appropriate way to characterize the ‘typical’ background level.  A simple average may be acceptable, or the intensity distribution may make the median more appropriate.  Perhaps you may find that sigma-clipping will give a more robust value.  Looking at the histogram of background pixel intensities can be useful for determining the best course of action.  Ultimately, as long as you have good justification for your choice, you should be fine.
  • Does the PSF make your BG region contain a contribution from the source? For example, in PACS images, the beam has a measurable contribution out to ~1000 arcsec (though of course it’s quite small that far out).  Typical small PACS maps are a few arcminutes across, so in principle every pixel in those maps will have a small contribution from the target source (and indeed every other pixel for that matter).  The effect may be small for your science goals, but it may be something that you decide you need to correct for.  See, e.g., Remy-Ruyer+2013 for one example of how to deal with this.


Correction factors – discussed in the previous post

  • Aperture correction – This is a correction for the amount of flux that the beam has scattered outside of your chosen aperture.  This will be a significant amount for small regions around the size of the beam (say, tens of percent), but will be small or negligible for very large apertures.  One of the best way to calculate this is to directly sum the fraction of the total power enclosed in your chosen aperture – by placing your region on a fits file of the beam profile, for example.  But if you know the instrument’s beam profile and you are using a standard aperture like a circle, you can construct an ‘Enclosed Energy Function’ (EEF) in advance by calculating the enclosed energy percentage for a wide range of radii.  Then your estimated aperture correction would be 1/EEF at the region’s radius.  HIPE has a function for calculating this, or you can also download the EEF tables yourself to create your own interpolation routines.
  • Color correction (IR).  This is a correction to account for the fact that the data reduction assumed a flat spectrum response, while in reality your source spectrum is something like a blackbody, with a different power-law spectral index.  Again, HIPE has a tool to calculate this for you, or you can retrieve the tables manually and calculate it yourself.  See the Spire Handbook:
  • K4 correction (IR, for extended objects) – this is a small correction to apply for extended sources to convert the RSRF-weighted flux density to monochromatic surface brightness.  Use the HIPE tool or download the tables manually.
  • Beam area – if you need to convert from Jy/beam to Jy.  Again, because of the assumed flat spectrum in the data reduction, your beam area will be slightly different (~ a few %) for the source’s true spectral index.  Use the HIPE function or download the tables yourself.


Uncertainty estimation

Some sources of uncertainty in your measurement:

  • Uncertainty in background level – if the BG pixel intensities are roughly Gaussian-distributed, you can estimate this from the standard deviation of the BG pixels.  Since the uncertainty will decrease as the square root of the number of measurements, $ \sigma_\mathrm{BG} \sim stddev(BG)/\sqrt{N_\mathrm{BG}} \times N_\mathrm{ap}$
  • Uncertainty on individual pixel intensities (usually for stacked or drizzled images).  The total pixel error is generally calculated as the quadratic sum of all σpix,i
  • Confusion noise – this is particularly important for catalog measurements of small faint sources. This is basically the random fluctuations in the background sky brightness.  Below this level you can’t detect a source, even if you integrate for more time.  Causes are foreground Galactic cirrus, cosmic background, background galaxies.  Confusion noise depends on wavelength, position on sky, beam profile.  See the Herschel docs for a more complete discussion:
  • One of the most fundamental is the uncertainty on the source flux sum.  Some authors choose to use a ‘theoretical’ estimate of this error: if you assume that the variation in the aperture pixels follows the variation in the background pixels, then σsource = sqrt(Nap)·σBG.  But this on its own doesn’t address the inherent variation due to the shape/location of your chosen aperture.  If you were to take your source region and move it around the map, the enclosed flux would vary by some amount – even if you are only moving it around ’empty’ sky.  This is a complex combination of many effects, including the relation between the region shape and the beam size, the confusion noise, inherent variation across the map, etc.  This can often be the dominant source of uncertainty in your measurement, probably because it includes so many effects.  You can estimate this empirically with monte carlo methods.  The idea is quite simple – place the source aperture at many places around your map that are empty of ‘real’ emission from the galaxies or whatever you’re interested in.  Then integrate the flux in each of these test regions.  The standard deviation of this array of test aperture values should serve as a good estimate of the inherent variability your chosen region shape should produce in your map.
  • Systematic/Calibration uncertainties – These include the beam area uncertainty (4% for SPIRE), and absolute flux cal uncertainty (5.5% for SPIRE). Spectroscopy and polarization measurements will of course also have their own systematic uncertainties.

Once you have determined all the individual uncertainties you need to include, if they are all independent of each other, you can calculate the total uncertainty as the quadratic sum of the individual components.  For example:
$ \sigma_\mathrm{tot} = \sqrt{ \sigma^2_\mathrm{BG} + \sigma^2_\mathrm{source} + \sigma^2_\mathrm{pix} + \sigma^2_\mathrm{cal} + … }$


Introduction to Infrared Telescopes and Observing

How an infrared (single-aperture) telescope works


Types of detectors

There are three common varieties, roughly based on the wavelength of observations.  This is because on the scale of infrared light wavelengths (a few to a few hundred microns), the photons are in the transition phase of being smaller/bigger than our detecting electronics.  When the photon wavelength is much smaller, it interacts with the detector like a particle; when it is much larger, it interacts with the detector like a wave. The three common types are:

  • Photodetector – ~a few microns.  This is somewhat like a camera CCD.
  • bolometer arrays – ~tens to ~1000 microns, for imaging.  Basically, a single bolometer is a little block of material (silicon/germanium is common) that will fully absorb or ‘thermalize’ IR photons. This slightly changes the temperature of the material, which changes its electrical properties (resistance,conductance), which can be measured.
  • heterodyne – ~70 to ~1000 microns, for spectroscopy.  The photon waves are mixed with a local oscillator (like an old-style radio).  Because the actual waves are detected, you can measure the polarization. Also, the phase is preserved, so you can do interferometry with multiple antennae/receivers.



General observing recipe for imaging (in optical, IR, etc.)

– ‘dark/bias’ scans to calibrate how much signal the detector ‘sees’ as a baseline, with the cover still on.  Subtract this.
– ‘flat field’ scans to calibrate the relative response of each pixel, since each has different sensitivity.  Divide this.

– Observe target –> This is your raw data.  Telescope software should put pointing, etc. info into a data header.
– Spectrometers will also observe a blank patch of sky separately for background subtraction



FIR observing

The atmosphere is mostly opaque to incoming IR light in many bands, due to absorption by water vapor.  Need very high-elevation dry sites for the open windows in the NIR and submm, or ideally airborne/space observatories like SOFIA & Herschel.

– The dark/bias current is usually much more of a problem, because the temperature/thermal emission of the science target can be very similar to the actual physical temperature of the detector/telescope/surroundings.  So these are typically cryogenically cooled, but we still need to be very careful with our calibration.
– The sky/background is quite high & variable in the IR, and atmosphere conditions can change quickly.
– Account for these facts by observing in a “chop-nod” pattern: chopping to remove sky background, nodding to remove telescope background.
– The secondary mirror moves from on-source to off-source (a few arcmins away) usually at speeds of a few Hz.  This is called a ‘chop’.  One chop contains the source plus the background, the other chop contains only background emission.  So when you subtract, you get the source-only signal.  (You will still need to do a separate background subtraction in your photometry, though!)
– The telescope itself is still contributing a lot of signal to your readouts, and it can actually be many hundreds of times the astronomical signal.  To remove this effect in single pointed observations, the telescope is physically slewed to an offset but nearby position a few times per minute – this is called the “nod”.  The chopped observations are repeated at the other nod to get a second background-subtracted image.  Then these are combined to give a final clean image that should only contain astronomical signal.
– The SOFIA FORCAST website has a really nice description and animations of chop/nod observing:
– For mapping large areas of the sky, cross-scan mapping can be employed (e.g. SPIRE ‘large map’ mode)

For IR telescopes, the telescope staff will have determined the flux calibration scale by repeated observations of ‘standard’ objects of infrared emission, such as planets and asteroids in our solar system.  Then in the data reduction process you can convert from instrument readout units like Volts to physical units like Watts and Jansky.

Note: 1 Jansky = 10-26 W/m2/Hz.


(Spatial) Resolution

Imagine looking up into the sky on a dark night and spotting a particularly bright point somewhere out there.
Lookie here!
Typical human eyesight has about 1 arcminute resolution [source], so if the bright spot on the sky is smaller than 1′, you obviously can’t make out any detail below that scale, and it’s effectively a point source for your detector (eye).  What does the true source look like?  It could be a single bright star far away, or a broader faint star, or even a small cluster of stars, a galaxy, or something else entirely. Which one of the following sources is the true source you’re looking at? You will need a telescope to find out.
Place your bets on which is the real one

To determine what the true source in your bright point on the sky contains – that is, to increase the spatial resolution – you will of course observe with a fancy telescope of some sort.  There are many subtle effects of the optics and detector you must be aware of if you want to understand the precise quantitative and qualitative properties of your signal.  One of these, which is very important when you get down to the lowest spatial scales in your image, is the effect of the Point Spread Function (often called the ‘resolving beam’ in longer-wavelength astronomy.)



When we observe light, what we see in our detector (whether that’s our eye, a telescope, or a common camera, etc.), we are observing through a certain aperture.  Optics theory (and empirical demonstration, for that matter) tells us that the pattern of light we detect through an aperture is a convolution of the original emission incident on the aperture and the fourier-transform of the aperture shape/profile.  Remember the simple single-slit laser experiment from your undergraduate physics labs (Fraunhofer diffraction) as a 1-dimensional example.  There, the incident light is basically uniform over the slit.  But you don’t see a uniform laser dot on the wall, you see a sinc2 function.  The aperture profile is basically a Heaviside (“top-hat”) function, whose fourier transform is a sinc function.  The convolution of those profiles (Intensity goes as amplitude times its complex conjugate, hence sinc2 on the wall.)

The point-spread function (also called a “beam” in radio/submm) is the pattern a single small source of light (less than the limiting 1.22 λ/D) will make on the detector due to the optics of the aperture.  An idealized circular (2D) aperture produces an Airy-disk PSF pattern, and the central part of that can be approximated as a 2D gaussian.  For a more complicated real-world example of a telescope, the PSF can be quite complex due to things like imperfections in the reflector surface, occultation from support struts, etc.  What this means for us in practice, when we are trying to measure the actual flux of an object, is that even if you observe a tiny point source in the sky, a (hopefully) small amount of that power coming through the aperture will be spread out over the detector in the pattern of that instrument’s PSF.  You need to correct for the fraction of emission that PSF spreads outside of your integration region during aperture photometry.  For very large regions, the effect gets much smaller, but even for Herschel images with ~20″ resolution, the PSF can make a difference of a few percent in regions out to a couple arcminutes.

For a given optics system, all else being equal, increasing the size of your dish will mean more collecting area to help with sensitivity – you can detect fainter things.  But a larger diameter also directly reduces the theoretical smallest scale (1.22 λ/D) you can resolve.  Another way to say this is that a larger dish means a smaller beam or PSF in our idealized telescope.  At some point, building a larger dish will become prohibitive, because of cost, land, or engineering constraints.  Interferometers such as the VLA and ALMA dramatically increase the maximum resolving baseline by combining signals from multiple separate dishes that can be many kilometers apart (or even the whole earth, for VLBI).

In the following example images of our funny source, the beam size is gradually increasing to the right – as would happen if you observed with progressively smaller telescopes.  Or, conversely, if the same telescope would observe the imaginary source at smaller and smaller angular sizes.  Not only does the image become more ‘blurry’ as the beam gets larger, but the flux from the source gets more spread out to the other pixels in the image.  Be aware that more pixels does not necessarily mean better spatial resolution.  See the tutorial on convolution and regridding for some more discussion on these points.

The PSF/ beam size increases from left to right. (Circular PSF FWHM denoted by the dotted grey circle) It's hard to see on a linear color scale, but notice how the flux gets spread out (making the inner beam appear fainter) as the PSF increases.



Color Corrections

Herschel/SPIRE imaging bands are not limited to exactly 250,350,500μm – they actually extend about 50μm to either side.  In addition to that, the filter transmission is not uniform across all wavelengths.  The exact profile of the wavelength-dependent transmission in a given imaging band is called the Relative Source Response Function (RSRF).  The incoming emission spectrum can have a variety of shapes, such as a power law.  The flux density recorded in the bolometer is the integral of the source’s spectrum weighted by the RSRF.   The data pipeline doesn’t know what shape the source spectrum has, however, so just assumes it’s flat – that is, νSν = constant.  It also assumes that the source is a point source.  (This calibration is done at the level-1/detector-timeline data stage.)  The monochromatic flux densities you get out of SPIRE – Sν0 for ν0 = 250,350,500μm – are produced with these assumptions, and weighted by the RSRF.  In reality though, your source is possibly not a point source, and almost certainly doesn’t have a flat or uniform spectrum.  For example, blackbody emission (planets, asteroids) in the SPIRE wavelengths (roughly the Rayleigh-Jeans limit) will follow a power law with α~2, while cold (~20K) dust will more typically have α~3 or 4, depending on the dust properties.  To get a usable ‘monochromatic’ flux (for example, a ‘250 micron flux’), you need to correct for these two assumptions.

The correction for the shape of the spectrum is called a ‘color correction’.  The assumed functional form for the source spectrum based on the monochromatic wavelength ν0 is:  S(ν) = S(ν0)·(ν/ν0)α.  Again, α = -1 is assumed, but your true source spectrum can be something like α=2 (blackbody), in which case you apply the appropriate correction factor.  See the Spire Handbook Section 5.2.6 “Colour correction for power-law spectra” (and 5.2.7 for greybodies) for a better explanation and the pertinent equations.  HIPE contains a table of color correction factors for various values of α, as well as for modified blackbodies as functions of temperature and β.

Herschel/HIPE assumes your data is a point source for reduction purposes.  If your source is actually extended, you need to make a small gain correction (called a “K4” correction) to the bolometer signals.  This is because point source observations care primarily about the peak signal in bolometers while extended source observations want to take caution about the consistency of the integral between individual bolometers, reducing striping in maps.  (See the SPIRE Handbook, Section 5.2.5 for a better explanation.)  This K4 correction factor also depends on the assumed shape of your source spectrum, and you can find the table of corrections for various indices of α in the SPIRE Handbook (possibly out of date) or within HIPE.

When converting from Jy/beam units to MJy/sr – that is, using the effectve beam/PSF area – note that the beam sizes also depends on the assumed spectral index.  Choose the appropriate beam area from the tables in HIPE.



Units of Astronomical Images and Conversions

If you’re like me, you frequently have to re-check the definitions for units and conversions with astronomical data.  We all learned this in our intro astronomy courses, but details sometimes get fuzzy after time.   It doesn’t help that there is occasional laxity in the use of the term ‘flux’ in the Literature, when they really mean flux density or even brightness (and vice versa)!  I have not yet found definitions and conversions for the most common astronomical image units in one resource, so I am venturing to create something like that here.

Some excellent resources I’ve found on the web on this topic, with figures and examples:
Beware, though, as even these sources use terms like brightness to mean different things.

To go straight to the quick reference guide, click here.

The big picture

As observational astronomers, we observe the sky, record the incoming photons, and quantify the emission in order to say something about the physics in the target system.  We generally want to determine an object’s luminosity, but what we measure with our instrument is the flux from that object incident on our detector area, within a certain wavelength range, coming from the source which covers a certain solid angle on the sky.  It takes a little bit of work to back out the luminosity.

Common units

There are many different units in use for the same quantities in astronomy, and some of the more historical units are still commonly seen (cgs, magnitudes).  Here I am concerned with the units you are likely to encounter in your images, with a slight preference for the radio part of the spectrum.  Note that I am sticking to physical units, not instrument units such as ADU.  Most books and guides I’ve seen start with the quantities you get in your images (such as brightness) and work their way down to the more fundamental quantities that we are interested in studying (such as luminosity).  I will do that later in the quick guide, but for a first look, I find it more intuitive to start with the simple quantities (what we’re trying to determine) and work our way up to what we get in our data products.  I will be working in SI units throughout, which is becoming more common in the Literature, but I will also use common units like Janskys where convenient.

Luminosity (L).  This is a fundamental characteristic of the source you observe.  Dimensionally equivalent to the physics quantity Power, it’s simply the amount of energy radiated per unit time. Though it is a measure of how bright something is in the colloquial sense, it is NOT equivalent to the term ‘brightness’ in astronomy.  (See below.)  The most common units are Watts [W], though you may still occasionally see L in terms of ergs/s.  (1Jy = 107 erg)  Since we typically assume isotropic radiation for simple cases (definitely not true for many sources you may encounter!), the luminosity is generally taken to be the flux times the area of the sphere at your distance D.  (c.f. the Stefan-Boltzmann Law!)  That is, L = Asphere F = 4πD2F.

Flux (F).  The farther away a source is, the dimmer it will appear to an observer.  Think of how the high beams of an oncoming car seem brighter as the car gets closer, and less blinding when they are farther away.  Of course, this is because the radiated energy is spread out over a larger area as distance increases. The amount of radiated energy that passes through an area is flux.  SI units are W/m2, though you will still see erg/s/cm2.  Since we are interested in quantifying the luminosity of an object – that is, the inherent power radiated – for analysis and science goals we are not as much interested in the amount of light that passes through the detector area, but rather the total amount of energy radiated at the source. (However, see caveat below.)  To get this, as mentioned above, we assume isotropic radiation (for simple cases) and multiply the flux by the area of the great sphere at the observer’s distance D from the source: Asphere = 4πD2.  Notice the D2: flux follows the inverse square law.

HUGE CAVEAT: We don’t live in a conceptually simple textbook universe. (Surprise!)  In our analysis of physical systems, we often make assumptions about source geometry, isotrpic emission, and a host of other factors.  It’s not that we’re lazy – we often don’t have the information, resolution, or telescope time to do a detailed analysis of every system.  Instead, we make reasonable educated guesses.  This leads to a subtle but crucial point: our instruments do not give us the properties of a source, but rather they give us a snapshot of the light that our sensors detect when pointed in that direction.  We don’t know that we recover all the flux, for starters – this is especially pertinent for interferometers, which act as spatial filters.  Flux on large or small scales can be missed, depending on dish configurations, and flux also depends on frequency.  Though we are trying to measure the properties of the sources we observe (e.g., luminosity), in reality we are limited to measuring the signal registered in our detectors (e.g., specific intensity or flux density).  In other words, the final product you get out depends on the model you use, so it’s always a good idea to report results in terms of observables!

Rendered by

Brightness (B) or  Intensity (I). In astronomy, “brightness” is a technical term that describes the power radiated through a surface area, divided by solid angle.  SI units are W/m^2/steradian.  In physics terminology, this quantity is sometimes referred to as radiance.  When flux is divided by the solid angle that the source covers on the sky, the result is a quantity that is independent of distance.      CAUTION:  There seem to be two camps when it comes to brightness: It is most common to use the terms brightness and intensity to describe the same thing (Power per area per solid angle) – this is related to the B in Planck’s blackbody law or the Rayleigh-Jeans and Wien approximations. [Bν]= W/m2/Hz/sr.  Thus, brightness is independent of distance to the source.  Some, however, use brightness and flux interchangeably (making it dependent on distance).  This makes things confusing, to say the least.  Even my favorite intro astronomy textbook (the venerable Astronomy & Astrophysics by Zeilik & Gregory) falls into the latter category, though they do mention that they mean flux when they say brightness, and use the symbol f for it.  I assume for the rest of this discussion that brightness is dimensionally equivalent to intensity (NOT flux), to remain consistent with the convention of the Planck law definition and its approximations. The NRAO page linked above has a very good discussion of brightness.

Surface Brightness (S).  The term “surface brightness” is used specifically for objects whose projection on the sky is extended (covering many pixels or beam sizes).  The units are equivalent to brightness from above: W/m2/steradian.  If you’re looking at a point source, you know that all the radiation is coming from that discrete location.  But if you’re looking at something like a gas cloud, you really need to specify how spread out the emission is.  For example, let’s say you are looking at a movie screen from the back of a theater. You have good eyesight, so you can resolve things down to 1cm from your distance.  If someone shines a laser pointer on it before the movie starts, the resulting dot (<~1cm) is essentially a point source, so you can determine the flux easily – it’s all coming from the single tiny spot.  However, if you now observe the screen during the movie, you may see quite a bit of illumination from the region – maybe maybe it’s an underwater shot with a gradient of bright blue at the surface to dim blue below.  You can only make things out down to about 1cm, so you can’t determine the flux due to every infinitesimally small point, but you can determine the brightness [flux per square degree, say] of any piece of the screen just fine.  You can always divide the brightness in a region by its solid angle to get its average flux. This scenario is obviously contrived, but it demonstrates the necessity of care when dealing with extended sources.

Optical images are often given in terms of magnitudes, for example mag/square”.  It is also common to see magnitudes per physical measure of the solid angle based on the source, for example mag/pc2.  Here, the pc2 means the source area on the sky, which means that an assumption about the distance to the object is built in.

Rendered by

A quick note on summing: If you are summing the pixel brightnesses (to determine luminosity, for example), the appropriate solid angle to multiply by is the solid angle of one pixel (such as the red square in the image above), not the total solid angle of the source (the blue galaxy above).  This is because the sum already takes into account the total area.  See the tutorial on reprojection for further discussion.

Magnitudes (mag) & surface brightness (mag/arcsec^2).  Apparent magnitudes (m) are a measure of how bright celestial objects appear to us.  It’s a measure of flux peculiar to astronomy.  It’s based on the system developed by Hipparchus (or so the story goes) a few thousand years ago, and we still use the magnitude scale because astronomers love tradition, though the modern one has been modified.  Absolute magnitudes (M) are magnitudes that objects would have if they were observed from a standard distance of 10pc.  I’ll be sticking to apparent magnitudes, since that’s what you will encounter if your images are in [mag].  Magnitudes are logarithmic, and we determine flux in SI units by comparing to a reference source. The sun is the reference most commonly used, but of course, you can always use another one.  Here is the relation between magnitude and flux:
m – m = -2.5 log(F/F).
I should note here that this is true in general, but you will want to take care to use reference values for a specific wavelength/frequency band.  The Sun has slightly different flux and magnitude in blue, red, etc. light.  So a better way to write this relation might be m⊙,λ – m = -2.5 log(F/F⊙,λ).  I am using λ here instead of ν to avoid confusion with the typical notation for flux density, Fν.  (See next section)
Turning the equation around, we get the formula for flux:

F = F⊙,λ  × 100(m,λ – m)/5.

Spectral variations:
Each instrument has a finite bandwith over which it can detect light (think: red or blue filters in a camera or telescope).  When an instrument is capable of detecting light over a large-enough range of different channels, you will often get images in units that reflect the fact that the detected light is a function of wavelength of frequency.  Flux detected over many spectral channels is usually called Flux Density (Fν), and comes in units of W/m2/Hz or, as is common in the radio and FIR: Janskys (1 Jy = 10-26 W/m2/Hz).  Luminosity L [W] becomes spectral luminosity Lν [W/Hz], S becomes Sν (still called surface brightness), and Intensity/Brightness become Specific Intensity/Brightness (Iν, Bν) in W/m2/Hz/sr or Jy/sr.  The most familiar use of spectral brightness / specific intensity / spectral radiance in textbook studies is the Planck law, which quantifies the emission from a blackbody at a given temperature.

It is also common to encounter specific intensity images in units of Jy/beam.  “Beam” is the term used ubiquitously in radio (and often in the FIR) to refer to the angular size of the resolution element or point-spread-function, which is usually assumed to be a 2D gaussian.  The area under a single gaussian is    \int  Amp \cdot e$^{-\frac{1}{2}\left( \frac{x-x_c}{\sigma} \right)^2} dx = \sqrt{2 \pi} Amp \ \sigma$ , and thus the area under a 2D gaussian is computed as   \iint Amp \cdot e$^{-\frac{1}{2}[\left( \frac{x-x_c}{\sigma_x} \right)^2+\left( \frac{y-y_c}{\sigma_y} \right)^2 ]} dx dy = 2 \pi \sigma_x \sigma_y $ .  For the beam response, we set the amplitude = 1.  σ = FWHM/(2 sqrt(2 ln2)), so the double integral simplifies to  ~ 1.133 FWHM1* FWHM2.  The FWHM can be given in terms of arcseconds or pixels, depending on what you want. Thus, Jy/beam is a measure of surface brightness.

Conceptually, to get the total Flux, luminosity or brightness over a continuous distribution of emission, you would integrate the function over all observed frequencies.  Since our instruments discretize the signal into channels, we can take the approximation of sums.

   \noindent $\mathrm{ L = \int_0^\infty L_\nu \ d\nu \approx \sum L_\nu \ \Delta\nu}$\\ $\mathrm{ F = \int_0^\infty F_\nu \ d\nu \approx \sum F_\nu \ \Delta\nu}$\\ $\mathrm{ I = \int_0^\infty I_\nu \ d\nu \; \approx \; \sum I_\nu \ \Delta\nu}$

If the channels are uniform in size (usually the case), then things simplify because the sum changes to ∑ Fν,i × dνi → nchan × dν × ∑ Fν,i.  Thus you can sum the flux and multiply by the total bandpass, Δν = n×dν.

Since the total flux comes from the integration of flux over all channels, the total flux, (and intensity, etc.) is sometimes called the “integrated flux” (“integrated intensity”, etc.).  Be careful, though, as this term is also used to describe the total flux in an image resulting from integration of the flux in several pixels.


Don’t just blindly apply dimensional analysis!  To start, let’s define some commonly used quantities.

  • D = distance to source.
  • Abeam =   \iint Amp \cdot e$^{-\frac{1}{2}\left[\left( \frac{x-x_c}{\sigma_x} \right)^2+\left( \frac{y-y_c}{\sigma_y} \right)^2\right]} dx dy = 2 \pi \sigma_x \sigma_y \simeq 1.133 \ \mathrm{FWHM_x FWHM_y}$ . The FWHM can be in pixels or arcsec.
  • Δν = ν2ν1.  This is the effective passband of the instrument.
  • Ωsr = solid angle of a pixel in the image

Once all the definitions and units from the previous section are sorted out, converting between them is straightforward.  To go from intensity or (surface) brightness [W/m2/sr] to flux [W/m2/pixel], simply multiply by the pixel size (in steradians, beams, square arcsec, or whatever units the brightness was in).  There is a subtle point to be aware of here: notice that I wrote flux in units of  [something per pixel].  Total flux from a source in an image depends on how many pixels you include.  A single pixel will have units of [Jy/pix] or [W/m2/pix] – this makes the sum of several pixels [Jy] or [W/m2]. e.g., f[Jy/pixel]×n[pixels]=F[Jy]. The tricky part here is that sums of brightness maps still need to include the factor of steradians/pixel, etc.  See the examples section below for more discussion on this point.

To go from flux to luminosity, multiply your flux by the area of the sphere at distance D (from source): 4 pi D^2.  Easy peasy!  Again, luminosity in units of [W] or [erg/s] implies that you have summed over several pixels.  If you are making a luminosity map, just be aware that your units will technically be [W/pixel].

For magnitudes, we need to first convert to flux or luminosity.  Again, we determine flux from apparent magnitude by comparing to a reference – you can choose any reference of known flux/magnitude that you like, but the easiest to use is the Sun.  Since F=L/4πD2, you can work in terms of luminosity if you prefer:

m – m = -2.5 log(L/L × D2/D2).  So, if you’re going TO flux or luminosity from magnitudes, you can use

F [W/m2] = F⊙,λ × 100(m⊙,λ – m)/5      and

L [W] = L⊙,λ × (D/D)2 × 100(m⊙,λ – m)/5

For single maps in spectral units (Jy, erg/s/angstrom, etc.): divide by the total bandwidth used in that frame.  If it’s an image with several frames covering different frequencies/wavelengths/velocities, etc., then use the bandwidth of the individual channels.

Integrated moment maps (Moment-0, flux over all channels) can be a little tricky.  The image started out as a multi-channel data cube [Jy/beam or similar], and the channel information (frequency or wavelength) may have been converted to something more physically useful for analysis, such as recessional velocity v, redshift z, or distance from the observer.  When the image was then summed, it may have kept those converted units, giving the resulting data mixed units such as Jy/beam × m/s. Note that this is an intensity (or brightness) – Janskys denote flux density, where 1 Jy = 10-26 W/m2/Hz.  The “per hertz” bit makes it a measure of flux per channel.  Likewise, the m/s bit is a measure of the total bandwidth – it’s the recessional velocity range (determined from the redshift equation) that comes from the bandwidth.  In effect, the m/s and 1/Hz cancel out (with a leftover factor).

You may be thinking, “Why don’t we just divide by the total velocity range covered and multiply by the frequency passband – that would cancel the units!”  For one, if you download an image from NED or elsewhere, sometimes you don’t have all the info on hand, such as the channel widths, number of channels, or total bandwidth summed over.  But the real rub is that intensity is technically defined by an integral.  On its own, when it depends on a single variable, it’s OK to approximate ∫ I(ν)dν ≈ ∑ I(ν) Δν.  But now we are dealing with an additional variable: I = I(ν,v)  So when we want to simplify, we have to put in the differentials: I [W/m2/beam] = I [W/m2/beam/Hz · m/s] × dν/dv.  This is NOT the same as I [W/m2/beam/Hz ×m/s] × Δν/Δv because the differentials are, well, differentials – variables, not static.

To avoid confusion between the notation for velocity v and frequency ν, I will switch the notation for frequency from ν to f for this part.  Again, I [W/m2/beam] = I [W/m2/Hz · m/s] × df/dv.  The recessional velocity is given by the standard redshift equation, using the rest frequency f0:
v(f) = c×(ff0)/f0

Note, however, that this definition is used mostly by radio astronomers.  Optical astronomers use v/c = (ff0)/f instead.  See here ( ) for more discussion on that point.

∂v/∂f = ∂/∂f [( 1-f/f0 ) c ] = –c /f0
| ∂f/∂v | = | ∂v/∂f |-1 = f0/c

So, to convert from Jy/beam · m/s, which is an intensity, we multiply by ∂f/∂v = f0/c  , NOT by Δf/Δv.

Here are some other factors that you may (or may not?) come across:
∂v/∂λ = ∂/∂λ [ (λ-λ0)/λ0×c ] = c0, so | ∂λ/∂v | = λ0/c
c = λf →  ∂f/∂λ = ∂/∂λ( c/λ ) = –c → | ∂f/∂λ | = c/λ2, | ∂λ/∂f | = c/f2


Coming soon!

Rendered by

Quick Reference:

   Common Units in Astronomical Images \\\begin{tabular}{ llll }  \hline \hline  Name(s) & Symbol(s) & Units & Formula   \\  \hline \\ Intensity, Brightness              & I$_\nu$   & W/m$^2$/Hz/sr   & I$_\nu$ = dP/(d$\nu$ dA d$\Omega$)  \\ Surface Brightness                 & S$_\nu$   & W/m$^2$/Hz/sr   &                        \\ (Integrated) Intensity, Brightness & I, B      & W/m$^2$/sr      & I=$\int_0^\infty$ I$_\nu d\nu \approx \sum $ I$_\nu \ \Delta\nu$  \\ Flux Density                       & F$_\nu$   & W/m$^2$/Hz      &                      \\ Flux                               & F         & W/m$^2 $        & F=$\int_0^\infty$ F$_\nu$ $d\nu$      \\ Luminosity                         & L         & W             & L = 4$\pi$D$^2$F            \\ & & & \\ (Apparent) Magnitude  & m  & mag   &   $m_{\odot,\lambda}-m = -2.5 \ \mathrm{log_{10} ( F/F_{\odot,\lambda}})$ \\  \end{tabular}

Click here for a printout cheat sheet of the units and conversions discussed on this page.

   Conversions  \begin{tabular}{ llllll }  \hline  \hline  \hspace{1.75cm} To $\rightarrow$  & I$_\nu$, S$_\nu$ [W/m$^2$/Hz/sr] & I [W/m$^2$/sr] & F$_\nu$ [W/m$^2$/Hz] & F [W/m$^2$] & L [W]  \\ From  & & & & &\\  \hline \\  I$_\nu$, S$_\nu$ [W/m$^2$/Hz/sr] & \cdots  & $\Delta \nu$ & $\Omega_{sr}$  & $\Delta \nu \cdot \Omega_{sr}$ & A$_{sphere} \cdot \Delta \nu \cdot \Omega_{sr}$ \\  I [W/m$^2$/sr] & 1/$\Delta \nu$   &   \cdots  & $\Omega_{sr}$/$\Delta \nu$  & $\Omega_{sr}$ & A$_{sphere} \cdot \Omega_{sr}$ \\  F$_\nu$ [W/m$^2$/Hz] & 1/$\Omega_{sr}$  & $\Delta \nu$/$\Omega_{sr}$ &  \cdots & $\Delta \nu$ & A$_{sphere} \cdot \Delta \nu$ \\  F [W/m$^2$] & 1/($\Delta \nu \cdot \Omega_{sr}$) & 1/$\Omega_{sr}$  & 1/$\Delta \nu$ & \cdots & A$_{sphere}$ \\  L [W] &  1/($\Delta \nu \cdot \Omega_{sr} \cdot $A$_{sphere}$)  & 1/(A$_{sphere} \cdot \Omega_{sr}$)  & 1/(A$_{sphere} \cdot \Delta \nu$)   & 1/A$_{sphere}$   &    \cdots \\ \\ \\  \multicolumn{6}{l}{Mag: F = F$_{\odot,\lambda} \cdot 100^{(m_{\odot,\lambda} - m)/5}$  to get Flux, then use above conversions } \\ \multicolumn{6}{l}{ \hspace{1.0cm} (Can use Luminosity L = L$_{\odot,\lambda} \cdot$ (D/D$_\odot)^2\cdot 100^{(m_{\odot,\lambda} - m)/5}$ as well.  L$_{\odot,\lambda}$ from, e.g., Binney \& Merrifield) } \\ \\ \multicolumn{6}{l}{ Mag / square arcsec: I = F / square arcsec = F$_{\odot,\lambda}$/arcsec$^2 \cdot 100^{(m_{\odot,\lambda} - m)/5}$ to get brightness, then use above conversions} \\ \\  \multicolumn{6}{l}{Jy/beam * m/s: multiply by ${\partial \nu}/ {\partial \mathrm{v}} = \nu_0/c$ to get intensity (then use above conversions): }\\ \multicolumn{6}{l}{ \hspace{1.0cm} I $ \left[ \mathrm{ \frac{W}{m^2 bm} } \right]$ = I $ \mathrm{ \left[ \frac{Jy}{bm} \cdot m/s  \right] \times 10^{-26} \cdot \frac{\partial \nu}{\partial v}  = I \left[ \frac{Jy}{bm} \cdot m/s  \right] \times10^{-26} \cdot \frac{\nu_0}{c}}$  } \\ \multicolumn{6}{l}{ \hspace{1.0cm} ${\partial \lambda}/ {\partial \mathrm{v}} = \lambda_0/c, \; \; {\partial \lambda}/ {\partial \nu} = c/\nu^2, \; \; {\partial \nu}/ {\partial \lambda} = c/\lambda^2$  } \\  \\  \end{tabular}



Before we start, here are some useful links on some of the topics covered in this post.

Generate your own custom colormap

Using Linear Segmented Colormaps

Scaling the indices in a cmap by a function

The basics

The color dictionary in a colormap consists of a set of tuples for ‘red’, ‘green’, ‘blue’, each with three elements.  Each of the tuples has the form (x,y0,y1), which sets the RGB values that will be used for interpolation in the colorbar.  I find it more instructive to think about them as ( cm_index, below_interp_to, above_interp_from ):

  •   ‘cm_index’ is the index in the colormap, from 0 to 1, that corresponds to the normalized range of the data values. That is, cm_index=0 in the colormap corresponds to the minimum data value plotted, and cm_index=1 in the colormap corresponds to the maximum data value that is plotted.  If you think about it like a colorbar, cm_index=0 is the bottom color, and cm_index=1 is the top color.
  •   ‘below_interp_to’ is the R, G, or B value (from 0 to 1) that is used for interpolation below the ‘cm_index’ value in the tuple
  •   ‘above_interp_from’ is the R, G, or B value (from 0 to 1) that is used for interpolation above the ‘cm_index’ value in the tuple

How to create a colormap:

from matplotlib import colors, cm, pyplot as plt

cdict = {
    'red'  :  ((0., 0., 0.), (0.5, 0.25, 0.25), (1., 1., 1.)),
    'green':  ((0., 1., 1.), (0.7, 0.0, 0.5), (1., 1., 1.)),
    'blue' :  ((0., 1., 1.), (0.5, 0.0, 0.0), (1., 1., 1.))

#generate the colormap with 1024 interpolated values
my_cmap = colors.LinearSegmentedColormap('my_colormap', cdict,1024)

Accessing the values in an existing colormap.



print cmapvals
#  -->
#    {'blue': ((0.0, 0.5, 0.5),
#              (0.11, 1, 1),
#              (0.34, 1, 1),
#              (0.65, 0, 0),
#              (1, 0, 0)),
#     'green': ((0.0, 0, 0),
#              (0.125, 0, 0),
#              (0.375, 1, 1),
#              (0.64, 1, 1),
#              (0.91, 0, 0),
#              (1, 0, 0)),
#     'red': ((0.0, 0, 0),
#             (0.35, 0, 0),
#             (0.66, 1, 1),
#             (0.89, 1, 1),
#             (1, 0.5, 0.5))}

For examples of how this works, let’s look at some possible tuples in ‘blue’: (if this gets confusing, just jump ahead to the examples that follow)

  • (0,0,0)  –> This tuple defines the beginning point, cm_index=0 (the bottom of the colorbar, for the lowest data value). Blue starts at B=0 here.
  • (.25,0.1,0.1) –> This says that for the colormap value 0.25 (one-quarter of the way up the colorbar), B should be interpolated from the value in the previous tuple (B=0, from the one we talked about above) to B=0.1.  Furthermore, the interpolation value to use between cm_index=0.25 and that of the next tuple (0.5, coming up next) is also B=0.1.
  • (0.5,0.8,1) –> This says that for cm_index=0.5 (the half-way point on the colorbar), the blue value should interpolate B values between the previous definition point (B=0.1 at cm_index=0.25 from before) and 0.8.  It also sets the interpolation point at 1.0 between cm_index=.5 and the next point.  The fact that we set different values 0.8 and 1.0 for bwlow_interp_to and above_interp_from means that you will see a break in the colormap at cm_index=0.5.  In other words, the colormap will not have a completely smooth color transition at the half-way point.  We’ll see an example of this later.
  • (1,1,1) –> This tuple defines the end point: cm_index=1, the top of the colorbar. Interpolate between B=0.8 and B=1 for points between the last tuple and this one (cm_index=0.5 to 1).

Example: Make a simple colorbar of entirely blue values, starting from black. (Black is R=0,G=0,B=0.  White is R=1,G=1,B=1.  Pure Blue is R=0,G=0,B=1.)
To do this, we can set our ‘blue’ values to start at B=0 at ‘r’ = colorbar range = 0, then interpolate up to B=1 at ‘r’ = 1.  We don’t want red or green in this example, so we will set them both to be R=0 and G=0 at ‘r’=0 as well as at ‘r’=1.

cdict_blue = {
    'blue'  : ((0,0,0),  (1,1,1)),
    'green' : ((0,0,0),  (1,0,0)),
    'red'   : ((0,0,0),  (1,0,0))


Whenever the R,G,B values all equal the same value, you have a shade of gray.  Thus, we can make a simple grayscale with:

cdict_gray = {
    'blue'  : ((0,0,0),  (1,1,1)),
    'green' : ((0,0,0),  (1,1,1)),
    'red'   : ((0,0,0),  (1,1,1))


Now, let’s make a special cmap that uses entirely red values (starting with black at the bottom).  But now, let’s give it a sharp break at 80%, at which point it will transition from blue to pure white.  This is useful if you want to demonstrate that some of your data goes above a threshold (80% here), and still show the variation above that threshold in another color (blue) without simply blanking it at a uniform value.

cdict_redtoblue = {
    'blue'  : ((0,0,0), (0.8,0,1), (1,1,1)),
    'green' : ((0,0,0), (0.8,0,0), (1,1,1)),
    'red'   : ((0,0,0), (0.8,1,0), (1,1,1))


More info at

Show Basic Colormaps Plot
#Plotting the example colormaps
import numpy as np
from matplotlib import colors, cm, pyplot as plt


ax1.axes.get_xaxis().set_visible(False); ax1.axes.get_yaxis().set_visible(False)

ax2.axes.get_xaxis().set_visible(False); ax2.axes.get_yaxis().set_visible(False)




If you want to register the new cmap for permanent use on your machine:
cm.register_cmap(name=’my_cmap_name’, cmap=cmap_you_created)

Scaling your data (rather than scaling the colormap).

It is often easier and more desirable to scale your data before plotting.  The colorbar scale will remain linear.  Simply apply some type of transformation to your data array (such as logdata=numpy.log10(data) ), then plot as you normally would.  Here is an example.

Scaling Data Code
import numpy as np

#For z=x^beta: beta=logz/logx.  Take the last data point for a quick & dirty estimate of beta.


plt.title('Original Data: z = x$^{\\beta}$')
ax1.axes.get_xaxis().set_visible(False); ax1.axes.get_yaxis().set_visible(False)

plt.title('Scaled Data: $\\textrm{z}^{1/ \\beta} = {\\left( \\textrm{x}^{\\beta} \\right) }^{1/\\beta} $')
ax2.axes.get_xaxis().set_visible(False); ax2.axes.get_yaxis().set_visible(False)



Scaling the colormap

If you are dealing with log scale, you can simply set the normalization to ‘LogNorm’ within the plot command. For instance:
pyplot.imshow(data,cmap=’jet’,norm=matplotlib.colors.LogNorm(vmin=data.min(), vmax=data.max()) )
See for an example.

If you want to scale the colormap indices to a function manually, there are a few resources that are already available online.  The scipy documentation has a helpful piece of code to accomplish this.  Below is a slightly modified version of the cmap_xmap function available at

import matplotlib
def cmap_xmap(function,cmap):
    """ Applies function, on the indices of colormap cmap. Beware, function
    should map the [0, 1] segment to itself, or you are in for surprises.

    See also cmap_xmap.
    cdict = cmap._segmentdata.copy()
    function_to_map = (lambda x : (function(x[0]), x[1], x[2]))
    for key in ('red','green','blue'):
        cdict[key] = map(function_to_map, cdict[key])
        assert (cdict[key][0]<;0 or cdict[key][-1]>;1), "Resulting indices extend out of the [0, 1] segment."
    return matplotlib.colors.LinearSegmentedColormap('colormap',cdict,1024)

#Example usage:
cmap_new=cmap_xmap(lambda funcvar: funcvar**pscale,cmap_old)

If your function results in cmap indices over 1, there is an easy way to re-normalize to the 0:1 scale.  Use matplotlib.colors.BoundaryNorm() with appropriate boundaries based on the values you got from cmap_xmap

cmap_new=cmap_xmap(lambda funcvar: np.log10(funcvar),cmap_old)
scalednorm = colors.BoundaryNorm(bounds, cmap_new.N)


Here is an example of colormaps with different power scales, making use of cmap_xmap.


cmap_4=cmap_xmap(lambda funcvar: funcvar**4.,cm.spectral)
cmap_1_4=cmap_xmap(lambda funcvar: funcvar**(1/4.),cm.spectral)  #Note the dot after the 4!

plt.title("Original cmap 'spectral'")
ax1.axes.get_xaxis().set_visible(False); ax1.axes.get_yaxis().set_visible(False)

plt.title("cmap 'spectral' scaled to the power p=4")
ax2.axes.get_xaxis().set_visible(False); ax2.axes.get_yaxis().set_visible(False)

plt.title("cmap 'spectral' scaled to the power p=1/4")
ax3.axes.get_xaxis().set_visible(False); ax3.axes.get_yaxis().set_visible(False)



Multiple Scaling

Sometimes normal colormap scaling tricks just don’t work.  It is difficult to highlight variations in faint features and bright features at the same time within a single data set.  In these cases, you would want the colormap to have high range at the low end and again at the high end, with little range in the middle.  I’ve created one simple solution below based on the code for cmap_xmap in the SciPy documentation.  Of course there are many ways to accomplish this (in better ways, I’m quite sure), but this is simply a tutorial.

We are scaling the colormap by two different powers (pscale1 and pscale2), and manually defining the point where we want it to make the transition (breakpoint).  We need to define a function that will scale the cmap indices by one power before the break point and then by the other power after. We must also take care this time to make sure that there is no overlap between the two powers – let’s say we decide to break at colormap index 0.5.  If we choose our scaling powers as 1/2. and 3 (–> x^(1/2.) and x^3), then colormap index of 0.4 would be scaled to 0.6324… (which is greater than the breakpoint of 0.5) while cmap index 0.7 would remap to 0.343 (below the breakpoint).  This would create a mix-up in the colorscale, and the resulting cbar would not follow the same color progression as the original.  One simple fix would be to set those values that would pass the breakpoint to be just the breakpoint.  In other words, 0.6324 –> 0.5 and 0.343 –> 0.5.  Of course this isn’t ideal since it creates a hard break, but it’s a start.

In this simplisitic solution, we make a new function called pscalefn() that takes care of all that.

#Multi-power scaler for colormaps
def cmap_powerscaler(cmap_in,breakpoint=.5,pscale1=2,pscale2=(1/2.)):
    #*args are breakpoint, pscale1, pscale2
    #defaults are 0.5, 2, 1/2.
    try: matplotlib
    except: import matplotlib

    cdict = cmap_in._segmentdata.copy()
    function_to_map = (lambda x : (pscalefn(x[0],breakpoint,pscale1,pscale2), x[1], x[2]))
    for key in ('red','green','blue'):
        cdict[key] = map(function_to_map, cdict[key])
        assert (cdict[key][0]<0 or cdict[key][-1]>1), "Resulting indices extend out of the [0, 1] segment."
    return matplotlib.colors.LinearSegmentedColormap('colormap',cdict,1024)

def pscalefn(cval,breakpoint,pscale1,pscale2):
    if cval<=breakpoint:
        if cval**pscale1<=breakpoint: return cval**pscale1
        else: return breakpoint
        if cval**pscale2<=breakpoint: return breakpoint
        else: return cval**pscale2

Sample usage


Multiple Scales Code

cmap_3to1_3=cmap_powerscaler(cm.spectral,breakpoint=.5,pscale1=3.,pscale2=(1/3.)) #Note the dot!

plt.title("Original cmap 'spectral'",size=12)
ax1.axes.get_xaxis().set_visible(False); ax1.axes.get_yaxis().set_visible(False)

plt.title("Scaled to the power p=3 up to cm_index=0.5, and to p=1/3. after",size=12)
ax2.axes.get_xaxis().set_visible(False); ax2.axes.get_yaxis().set_visible(False)

plt.title("Scaled to the power p=1/3. up to cm_index=0.7, and to p=3 after",size=12)
ax3.axes.get_xaxis().set_visible(False); ax3.axes.get_yaxis().set_visible(False)



The most useful way to see the different scales is to apply them to some real data.  Here we will again use the image of M101 as used extensively in the Kapteyn package tutorials.

The examples below show the following settings:

  •      Original colormap with no scaling done (linear)
  •      Scaled as x6, but with a break at cmap index 0.2
  •      Transition from x3 to x2 with the break at cmap index 0.2
  •      Transition from x(1.2) to x(1/1.5) with the break at cmap index 0.5
Galaxy Plots Code
import pyfits
import pywcsgrid2
from mpl_toolkits.axes_grid1.inset_locator import inset_axes


cmap4=cmap_powerscaler(cm.spectral,breakpoint=.5,pscale1=1.2,pscale2=(1/1.5)) #Note the dot!

ax1.set_xlabel("$\Delta$RA (J2000)",size=8)
ax1.set_ylabel('$\Delta$DEC (J2000)',size=8)
plt.title("Original cmap 'spectral'\n(linear)",size=8)
cax1 = inset_axes(ax1,
                  width="5%", # width = 5% of parent_bbox width
                  height="100%", # height : 100%
                  bbox_to_anchor=(1.05, 0, 1, 1),
cbar1.set_label('Intensity (Some Units)',fontsize=8)

ax2.set_xlabel(""); ax2.set_ylabel("")
plt.title("Map scaled to the power p=6,\nwith a break at 0.2",size=8)
cax2 = inset_axes(ax2,width="5%",height="100%",loc=2,bbox_to_anchor=(1.05, 0, 1, 1),bbox_transform=ax2.transAxes,borderpad=0)
cbar2.set_label('Intensity (Some Units)',fontsize=8)

ax3.set_xlabel(""); ax3.set_ylabel("")
plt.title("pscale1=3, pscale2=2,\nbreakpoint=.2",size=8)
cax3 = inset_axes(ax3,width="5%",height="100%",loc=2,bbox_to_anchor=(1.05, 0, 1, 1),bbox_transform=ax3.transAxes,borderpad=0)
cbar3.set_label('Intensity (Some Units)',fontsize=8)

ax4.set_xlabel(""); ax4.set_ylabel("")
plt.title("pscale1=1.2, pscale2=(1/1.5),\nbreakpoint=.5",size=8)
cax4 = inset_axes(ax4,width="5%",height="100%",loc=2,bbox_to_anchor=(1.05, 0, 1, 1),bbox_transform=ax4.transAxes,borderpad=0)
cbar4.set_label('Intensity (Some Units)',fontsize=8)

plt.suptitle('M101 with Power-Scaled Colormaps',size=16,x=.5,y=.999)


Update: I decided to add another of my favorite colormaps.  This is similar to the STD GAMMA-II color table in IDL.  I find it really nice for astronomical images. It goes from black to blue to red to yellow to white.

    'red'  :  ((0., 0., 0.), (0.25,0.,0.), (0.5,1.,1.), (0.75,1.0,1.0),  (1., 1., 1.)),
    'green':  ((0., 0., 0.), (0.25,0.,0.), (0.5,0.,0.), (0.75,1.0,1.0),  (1., 1., 1.)),
    'blue' :  ((0., 0., 0.), (0.25,1.,1.), (0.5,0.,0.), (0.75,0.0,0.0),  (1., 1., 1.))

coolheat = colors.LinearSegmentedColormap('coolheat', cdict_coolheat,1024)

Here it is in action with the M101 image.  On the left is the simple cmap, on the right I’ve stretched the low end to separate the background from the low-level emission.

Code for plotting M101 with coolheat colormap
import numpy as np
from matplotlib import colors, pyplot as plt
import matplotlib
import pyfits
import pywcsgrid2
from mpl_toolkits.axes_grid1.inset_locator import inset_axes

#Defining the coolheat colormap manually
    'red'  :  ((0., 0., 0.), (0.25,0.,0.), (0.5,1.,1.), (0.75,1.0,1.0),  (1., 1., 1.)),
    'green':  ((0., 0., 0.), (0.25,0.,0.), (0.5,0.,0.), (0.75,1.0,1.0),  (1., 1., 1.)),
    'blue' :  ((0., 0., 0.), (0.25,1.,1.), (0.5,0.,0.), (0.75,0.0,0.0),  (1., 1., 1.))

coolheat = colors.LinearSegmentedColormap('coolheat', cdict_coolheat,1024)

### Stretching the colormap manually
    'red'  :  ((0., 0., 0.), (0.12,0.0,0.0), (0.35,0.,0.), (0.55,1.,1.), (0.8,1.0,1.0),  (1., 1., 1.)),
    'green':  ((0., 0., 0.), (0.12,0.0,0.0), (0.35,0.,0.), (0.55,0.,0.), (0.8,1.0,1.0),  (1., 1., 1.)),
    'blue' :  ((0., 0., 0.), (0.12,0.1,0.1), (0.35,1.,1.), (0.55,0.,0.), (0.8,0.0,0.0),  (1., 1., 1.))
coolheat_stretch = colors.LinearSegmentedColormap('coolheatstretch', cdict_coolheatstretch,1024)

### Alternatively, we can stretch the cmap using the cmap_xmap (from
### Here, use power=0.8
# def cmap_xmap(function,cmap):
#     cdict = cmap._segmentdata.copy()
#     function_to_map = (lambda x : (function(x[0]), x[1], x[2]))
#     for key in ('red','green','blue'):
#         cdict[key] = map(function_to_map, cdict[key])
#         cdict[key].sort()
#         assert (cdict[key][0]<0 or cdict[key][-1]>1), "Resulting indices extend out of the [0, 1] segment.";
#     return matplotlib.colors.LinearSegmentedColormap('colormap',cdict,1024)
# coolheat_stretch=cmap_xmap(lambda funcvar: funcvar**(0.5),coolheat)

m101dat=pyfits.getdata('m101.fits')   #Loading the galaxy image

ax1.set_xlabel("$\Delta$RA (J2000)",size=8)
ax1.set_ylabel('$\Delta$DEC (J2000)',size=8)
cax1 = inset_axes(ax1,
        width="5%", # width = 5% of parent_bbox width
        height="100%", # height : 100%
        bbox_to_anchor=(1.02, 0, 1, 1),
cbar1.set_label('Intensity (Some Units)',fontsize=8)

ax2.set_xlabel(""); ax2.set_ylabel("")
plt.title('Cmap scaled manually',size=10)
cax2 = inset_axes(ax2,width="5%",height="100%",loc=2,bbox_to_anchor=(1.02, 0, 1, 1),bbox_transform=ax2.transAxes,borderpad=0)
cbar2.set_label('Intensity (Some Units)',fontsize=8)

plt.suptitle('M101 With "coolheat" Colormap',size=14,x=.5,y=.8)



NaN (Not a Number) is an interesting and useful concept.  For many people, NaNs represent errors or bugs in code, but there are many applications for which they are beneficial and convenient.  In this discussion we will make use of numpy masked arrays as we cover a few different cases where NaNs come into play.  You should check out the official webpage at
They have several excellent examples and a much more thorough coverage of the topic.  I do not intend this guide as a replacement, but rather as a brief supplement with additional specific applications.

As always, begin by importing numpy:

import numpy as np

First, I want to demonstrate the proper way to check if a number is an NaN.  You do NOT use the usual double equal sign.  Instead, use np.isnan()  For example:

a = 5
a == 5
# --> True

a = np.nan
a == np.nan # --> False
np.isnan(a) # --> True

For the first example, let’s generate some sample data (a quadratic with random noise) and explore what happens when we insert some NaNs.

#Generate some x-vals, as well as some y-vals with scatter.
xvals = np.arange(-3,5,.2)
yvals = xvals**2 - 7
for i in range( len(yvals) ): yvals[i] += 2*np.random.randn()

If we wanted to perform a simple fit, we could use scipy.polyfit(), since we have defined this as a quadratic.  If we wanted to find the minimum, we could use either of these:

And life goes on.  But now, what if you have an NaN for one or more of your y-array values?  Let’s make an example array:

yvals_with_nans = yvals.copy()
yvals_with_nans[3:10] = np.nan

You can still add, subtract, multiply, etc. just fine.  Even plotting with plt.plot(xvals,yvals_with_nans) works as expected, but note what happens when you try to determine special properties of the array:

# --> nan

np.min(yvals_with_nans) # --> nan

np.max(yvals_with_nans) # --> nan

np.sum(yvals_with_nans) # --> nan

Luckily, this is an easy fix for this particular situation.  Simply use numpy’s built-in tools for dealing with NaNs:
numpy.nanmin(), numpy.nanmax()

# --> -10.287...

But what if you want to use some regular functions on your array?  You can copy over only the parts that don’t have NaNs, but there is a more powerful way that lets you keep the original array while at the same time only using the non-NaN elements in calculations.  For this we use numpy masked arrays.

Masked arrays consist of two layers: the array of data and the array of mask values.  Experiment with the module:

a =[1,2,3,4,5],mask=0)
print a
# --> [1 2 3 4 5], mask = [False False False False False]

np.sum(a) # --> 15

a.mask[2] = True; a.mask[4] = True
print a
# --> [1 2 -- 4 --], mask = [False False True False True]

print a.mask # --> [False False True False True]
print # --> [1 2 3 4 5]

#Notice how returns the full list of un-masked elements.

np.sum(a) # --> 7
#It only summed the masked elements.

You may also find this trick useful: you can sum up the number of masked/un-masked elements quite easily.

np.sum(a.mask) # --> 2
np.sum(-a.mask) # --> 3

Note the use of the negative sign.  When you sum the mask, you are summing the True/False values.  Since Booleans are 1/0, the sum of [True True False True] = [1 1 0 1] will be 3.  If you sum the negative of the mask, you are summing [-True -True -False -True] = [False False True False] = [0 0 1 0], which is 1.  We will make use of this in a later example.

Now let’s invoke a special version of the masked array to automatically mask out invalid numbers like NaN and inf:

y_masked = yvals_with_nans )

print y_masked #Note how the nans have been replaced by -- marks

np.sum(y_masked) # --> 3.34306...

Et voila!  Arithmetic operations and np.min(), etc., will work correctly now on the masked array, though polyfit will not.  Be sure to check out the myriad other incdredibly useful masking arrays, such as

  •      and many more…

If you simply want to remove the NaNs entirely from your array, the easiest method is to compress the masked array.  This returns a 1-D array with all masked values stricken.  Note that even for 2-D or 3-D arrays (images or cubes), the masked_array.compressed() command will still return the 1-D array of all your un-masked values.

y_compressed = y_masked.compressed() #If you print, you will notice that the NaNs are now gone.

We probably want to keep our x-array the same length (for fitting, plotting, etc.).  We can remove the x-values corresponding to the y-array NaNs easily with the same mask:

x_masked =, mask=y_masked.mask)
x_compressed = x_masked.compressed()
print len(xvals), len(yvals_with_nans), len(x_compressed), len(y_compressed)
# --> 40 40 33 33

#Or you can even just do it on-the-fly with
x_compressed =, mask=y_masked.mask).compressed()

Now is a good time to insert a warning: Interpolation of missing data points should only be done with caution.  In general, it’s much better practice to perform a fit on your data (see the tutorial on fitting in python here) and get your data points that way.  The following is an example of why.  It can be useful to interpolate data for missing holes in 2-D images, though, if you need to make it continuous.

There are two easy methods for interpolation if you have 1-D data:

This is quick and easy if all your data values are increasing, but it will give you nonsense results if they are not.
Basic usage is numpy.interp(,x_array,y_array)

This is a much more full-featured module, which also has support for 2-D arrays/images.
Basic usage is scipy.interp1d(,x_array,y_array)

Here are examples for interpolating the y-value at index 2.5 (numpy) and at x=-2.5 (scipy) of the compressed arrays:

interpvals1 = np.interp(2.5, x_compressed, y_compressed) # --> -0.6589... (for my particular randomly-generated array)

from scipy import interpolate
interpvals2 = interpolate.interp1d(x_compressed, y_compressed, kind='cubic')
interpvals2( -2.5 )

Here is a plot demonstrating why blind interpolation can be dangerous.  The interpolated value can differ by quite a bit from the expected best fit.  Of course, in order to fit a curve, you must be sure of the function’s form (polynomial, exponential, etc.).

Interpolation Plot Code
import numpy as np
from scipy import polyfit
from matplotlib import pyplot as plt

plt.rc('font',family='serif') #Just because I prefer serif fonts on my plots

#Making the arrays as we did above
for i in range( len(yvals) ): yvals_with_nans[i]+=2*np.random.randn()

#Masking the NaNs

#Performing additional steps
interpval=np.interp(2.5,x_compressed,y_compressed) #x-val is -1.8
yfit=fitvals[0]*xvals**2+fitvals[1]*xvals+fitvals[2] #Note that I am using xvals here to smoothly fill in the empty region

#Actually making the figure
plt.xlabel('x'); plt.ylabel('y')

plt.plot(x_compressed,y_compressed,'.',color='#347C17',label='Compressed Array')
plt.plot(-1.8,interpval,'s',color='#990000',label='Interpolated Point')
plt.plot(xvals,yfit,'--',color='k',label='Quadratic Fit')
plt.xlabel('x'); plt.ylabel('y')
plt.title('Fit on y_compressed',size=10)

#Adding text boxes below the plots with some diagnostic info
ax1.text(0,-.2,'np.min(yvals_with_nans)       = %.2f\nnp.nanmin(yvals_with_nans) = %.2f\nnp.min(y_masked)                 = %.2f'%(np.min(yvals_with_nans),np.nanmin(yvals_with_nans),np.min(y_masked)),ha='left',va='top',transform=ax1.transAxes,size=8,bbox=dict(boxstyle='round',fc='w'))

ax2.text(0,-.2,'Best fit:  y = %.2f x$^2$ + %.2f x + %.2f\nTrue function: y = x$^2$ - 7'%(fitvals[0],fitvals[1],fitvals[2]),ha='left',va='top',transform=ax2.transAxes,size=8,bbox=dict(boxstyle='round',fc='w'))

plt.suptitle('Sample Arrays with NaNs',size=16)


Images and NaNs


Many astronomical images have NaNs for valid reasons – most often it’s because we blank everything outside of some desired region, whether it be a photometry aperture, the edge of the a detector’s field-of-view, etc…  But often we want to perform arithmetic on an image with NaNs, which would normally make the routines give NaN answers. The methods presented above for 1-D cases extend to 2-D images as well.

Let’s use a real galaxy image as an example.  You can find the LITTLE THINGS VLA neutral hydrogen velocity (Moment-1) map for CVnIdwA here (click Save Link As; it’s ~1MB).  You can learn more about the LITTLE THINGS project at the NRAO site here:

This image contains the velocity values in m/s of the galaxy, determined from the beam-smoothed first moment.  All the values outside of the galaxy’s emission have been blanked with NaNs.  See Hunter, et al., 2012 in AJ for more details on the data processing.

Let’s calculate the rms value of the velocity values in all of the pixels.  Recall that RMS=sqrt(values^2/number_summed)   To make the process easier to follow, we’ll break it up into parts:

import pyfits

#Import the data with pyfits.  It still has additional dimensions, so we will only load the ones we care about - the last two
#veldata = pyfits.getdata('<path to where you saved the .FITS file>/CVnIdwA_NA_XMOM1.FITS')[0,0,:,:]
veldata = pyfits.getdata('CVnIdwA_NA_XMOM1.FITS')[0,0,:,:]
veldata_masked =

#We will sum all the non-NaN pixels.  We can calculate the number of pixels to be used in a number of ways.
#...Pick your favorite.

n_used = np.count_nonzero( veldata_masked.compressed() )
# -- OR -- #
n_used = np.sum(-veldata_masked.mask) #Note the negative sign.

rms = np.sqrt( np.sum(veldata_masked**2) / n_used)
# --> pixrms = 311076.84 m/s

# -- OR an example of on-the-fly calculation -- #
rms_v2 = np.sqrt( np.nansum(veldata**2) / np.count_nonzero( ))

#Now if we want the RMS of the difference from the mean:
rms_dif = np.sqrt( np.sum( (veldata_masked - np.mean(veldata_masked))**2 ) / n_used)
# --> The RMS difference of a pixel's velocity from the mean value is 9550.46 m/s

Note that we could also have done simple RMS calculation without masked arrays as shown above.  But there is no numpy.nanmean() to use for the RMS difference calculation. This is one example of the versatility of masked arrays.

If we simply want to set the NaNs to a number (say, 0) for processing in some later routine, we can do it easily with

veldata_fix_temp =,fill_value=0)
#Then call the fixed array with:
veldata_fixed =

#Or, just do it on-the-fly:
veldata_fixed =,fill_value=0).data
np.max(veldata_fixed) # --> Max = 338740.47 m/s

You may want to leave your image with NaNs instead of filling with 0s, depending on what you’re interested in showing.  If you are plotting with pyplot.imshow() and you want to emphasize the regions that are masked, you can do this quite effectively with a clever choice of color map.  It is also a convenient trick if you want the background transparent instead of black for publications, etc.  It’s a matter of preference, really.  See the plots below. (The code to produce them is there as well.)  For a primer on plotting with kapteyn and pywcsgrid2, see my tutorial here.[link] Replace Plot NaNs with 0s

from matplotlib import pyplot as plt

plt.rc('font',family='serif') #Just because I prefer serif fonts on my plots

veldata_fixed =,fill_value=0).data

rms=np.sqrt( np.nansum(veldata**2)/np.count_nonzero(

#To plot wcs coordinates nicely, use the kapteyn or pywcsgrid2 modules.  I demonstrate use of pywcsgrid2 here.
#If you simply wanted to plot the data as you would any other image (without WCS coords), you could use the following code instead:
# fig2=plt.figure(2)
# ax1=fig2.add_subplot(1,2,1)
# plt.imshow(veldata,origin='lower',interpolation='nearest')
# ax2=fig2.add_subplot(1,2,2)
# plt.imshow(veldata_fixed,origin='lower',interpolation='nearest')

import pywcsgrid2

plt.title('With NaNs')
ax1.text(.05,.05,'RMS vel. = %.2e m/s'%rms,ha='left',va='bottom',transform=ax1.transAxes,size=8,bbox=None)

plt.title('NaNs $\\rightarrow$ 0')
ax2.text(.05,.05,'RMS vel. from mean = %.2e m/s'%rms_dif,ha='left',va='bottom',transform=ax2.transAxes,size=8,bbox=None,color='w')

plt.suptitle('CVnIdwA Integrated Velocity Maps',x=.5,y=.8,size=16)


Basic Plotting in Python

plottest1 plottest2

This is an example of how to make a simple plot in python, using data stored in a .csv file.
We will:

  •   Load the 2 columns of data from the file into a (numpy) array
  •   Plot the data with pyplot.plot
  •   Tweak some plot settings to make it pretty
  •   Save the plot to a file, view the plot in a window, or both

You will need to have installed on your machine:

  • python (I’m running 2.7 in this tutorial)
  • numpy (I’m using 1.5.1 here)
  • matplotlib (I’m using 1.1.0 here)

You can run this entire script (see the condensed version at the end of this post) from the command line by setting it to executable and typing:
or, you can run the whole script in ipython (type ipython at the command line) by typing
or, you can run each command individually in ipython

This script is not optimized for resources or speed – it is simply meant to be an easy-to-follow introduction to simple plotting.  If you haven’t seen it already, the official Matplotlib example gallery is an invaluable resource for plotting in python.  I would also refer the reader to the overview of plotting with Matplotlib by Nicolas P. Rougier here.

First, we will need to load some python modules…
– numpy is the numerical module that allows you to do Matlab- and IDL-esque numerical operations
– pyplot (part of matplotlib) is the plotting library

import numpy
from matplotlib import pyplot as plt

Here, we will load the data into python.  The example csv file is separated by commas, and strings (the titles) are in double-quotes.  This particular file has headers in the first row.  There are several ways to load a csv file into python; this is simply one easy way.

Normally, we might use numpy.loadtxt to do this, but our file has a funny format: the + sign in the scientific notation.  numpy.genfromtxt is a more robust version of loadtxt which can interpret this correctly and deal with missing cells.  This creates a 2D array of the data columns.


If we really wanted to be pedantic with our separation, we could create separate arrays for each column:


We can explicitly make a log(luminosity) array with numpy: (note that numpy.log is the natural log)


Plot the data values.  Syntax is pyplot.plot(xarray,yarray, other kwargs).  There is also a plt.scatter command, but we can just set the linewidth to 0.  Note that we can do computations on the fly within pyplot!


And now to set the labels. You can use Latex inline equation syntax.

plt.title('M vs. L')
plt.xlabel('log(Luminosity) in some units...')
plt.ylabel(r'log(M$_{H_{2}}$) in some units...')

Plot simply creates a plot object.  To view it, we need to either show or savefig.  Uncomment the one you want.  Note that pyplot will recognize the filetype (png,eps,pdf, etc…) you give it in savefig and save accordingly!


Then clear for the next example:


…And this is what we get:


Now let’s make two subplots:
a.) Large red half-transparent squares with blue edges of all the data points
b.) Thin green diamonds for real points, downward pointing arrows for the upper limits
We first need to create a figure, then add the subplots

For this one, we’ll use the second csv file which has a column describing whether a mass value is an upper limit
–> genfromtxt by default expects floats, but will read strings if we set the expected datatype (dtype) to None


An aside: The slickest way to use masks is with a numpy masked array.  See my tutorial on NaNs and masked arrays here.  For example:'"Yes"',massmask) # Remember our mask column is '"Yes"' and '"No"',mass) #This masks (removes) values where 'Limit'='"No"',mass),logL),logL)

If you want only the unmasked values, call with mass_limits.compressed(), etc… [end aside] Here we create separate lists for mass & luminosity of the upper limit values only and regular values only, using a standard for loop.  This method requires extra steps below.

limits=[]; regular=[]
for i in range(0,len(mass)):
    if massmask[i]=='"Yes"':  limits.append([mass[i],logL[i]])
    else: regular.append([mass[i],logL[i]])

Convert the lists to numpy arrays and transpose to put the data in order for plotting


Create the figure to which you will add subplots:


Add the first subplot

# 2,1,1 means 2 rows, 1 column, 1st plot

Now use the plotting commands just like before.

plt.title('Subplot 1 - Everything Together',fontsize=12)
plt.xlabel('log(Luminosity) ',fontsize=12)
plt.ylabel(r'log(M$_{H_{2}}$) ',fontsize=12)

Make the second subplot

# 2 rows, 1 column, 2nd plot

Plot the upper limits with pyplot.errorbar(xlocs,ylocs,xerr_length,yerr_length, other kwargs)
fmt=None means no marker for the point, lolims turns the bars into upper limit arrows, capsize=arrowhead, elinewidth=the width of the arrow shaft, mew=markeredgewidth= width of the bar

plt.errorbar(limits[1],limits[0],xerr=None,yerr=.15,fmt=None,ecolor='k', \
    lolims=True,capsize=3,elinewidth=1,mew=0,label='Upper Lims')

Then plot as usual for the regular points

plt.plot(regular[1],regular[0],linewidth=0,marker='d',markerfacecolor='g',alpha=.5,label='Exact Mass')
plt.title('Subplot 2 - Separated',fontsize=12)
plt.xlabel('log(Luminosity) ',fontsize=12)
plt.ylabel(r'log(M$_{H_{2}}$) ',fontsize=12)

Change the limits, for kicks


Let’s add a legend, using the labels we defined in our plot calls. Properties defined in a dictionary.

plt.legend(loc='upper left',prop={'size':8},numpoints=1) #More args: ncol=1,loc=1,fancybox=True ...

Add a super-title

fig1.suptitle('Mass vs. Luminosity',fontsize=16)

Tidy up the subplot spacing – numbers are fractions of subplot sizes.


Make the tick labels smaller. This can also control putting ticks on top, etc…  You can also use plt.ticklabel_format() to change the notation style


Change the font of all the text and labels to standard serif


Finally, show or save.


Here is the resulting plot:


To recap, here is a list of some of the commands I used to plot and a quick explanation of some of the most useful keyword arguments (with example values). See the links for the official man page that explains all the keywords! The general page is


  • marker=’s’ –> The marker to use for data points.
  • color=’red’ –> The color to use for plotting.  Accepts standard python color names, 0.0-1.0 for grayscale, or hex.
  • linewidth=0.5 –> The thickness of the line between points.  Set to 0 for no line.
  • markersize=10 –> The point size of the marker.  The shorthand is ms=…
  • markerfacecolor=’b’ –> The color of the marker’s body.  The shorthand is mfc=…
  • markeredgecolor=’0.5′ –> The color of the marker’s outline.  The shorthand is mec=…
  • alpha=0.5 –> Alpha sets the transparency level


  • Note that there are some subtleties with plt.legend(), depending on if you call it as an object or not.  (i.e., leg=plt.legend()…) and if you created your plot instances as objects.
  • loc=3 –> The location where the legend is drawn
  • numpoints=1 –> The number of merkers to show in the legend
  • handlelength=0.5 –> The length of line to draw (in points) in the legend
  • ncol=4 –> The number of columns to use in the legend
  • fancybox=True –> Use a box with rounded edges
  • prop={‘size’:8} –> This sets the font properties, such as size

pyplot.title(‘MyTitle’,*kwargs)   Also see pyplot.suptitle()

  • loc=’left’ –> Justify in center, left, or right
  • x=0.2, y=0.95 –> Manually set x,y coordinates (in figure fraction)


  • Left, right, top, bottom, wspace, hspace


  • Note that MyPlotTitle can be a full path, so you can save in any directory you like.
  • Also note that the pyplot figures out file savetype from the extension you put in the title (png, pdf, eps…)
  • dpi=300 –> The resolution of the output image in dots per inch
  • facecolor=’0.8′ –> Sets the color of the plot border
  • transparent=True –> Set the axes patch and figure patch to be transparent
  • bbox_inches=’tight –> Sets the the bounding box size. ‘Tight’ forces a crop to cut out whitespace.
  • pad_inches=0.1 –> Extra padding to use when bbox_inches=’tight’


Here is a condensed version of the commands.  You can save this as and run from the command line.

Click here

import numpy
import csv
from matplotlib import pyplot as plt

luminosity=data[:,1]; logL=numpy.log10(luminosity)
plt.title('M vs. L')
plt.xlabel('log(Luminosity) in some units...')
plt.ylabel(r'log(M$_{H_{2}}$) in some units...')

limits=[]; regular=[]
for i in range(0,len(mass)):
    if massmask[i]=='"Yes"': limits.append([mass[i],logL[i]])
    else: regular.append([mass[i],logL[i]])


plt.title('Subplot 1 - Everything Together',fontsize=12)
plt.xlabel('log(Luminosity) ',fontsize=12)
plt.ylabel(r'log(M$_{H_{2}}$) ',fontsize=12)

plt.errorbar(limits[1],limits[0],xerr=None,yerr=.15,fmt=None,ecolor='k',lolims=True,capsize=3,elinewidth=1,mew=0,label='Upper Lims')
plt.plot(regular[1],regular[0],linewidth=0,marker='d',markerfacecolor='g',alpha=.5,label='Exact Mass')
plt.title('Subplot 2 - Separated',fontsize=12)
plt.xlabel('log(Luminosity) ',fontsize=12)
plt.ylabel(r'log(M$_{H_{2}}$) ',fontsize=12)

plt.xlim(18,23); plt.ylim(6,10)
plt.legend(loc='upper left',prop={'size':8},numpoints=1)
fig1.suptitle('Mass vs. Luminosity',fontsize=16)

Sample csv files used in this tutorial

Click here for SampleData.csv
"log M(H2) [Msun]","Luminosity [W/Hz]"
Click here for SampleData2.csv
"log M(H2) [Msun]","Limit?","Luminosity [W/Hz]"

Curve Fitting

quadratic_fit_example power_fit_example gaussian_fit_example

This post goes through the basics of fitting functions to data sets.  There are many tricks to optimizing fitting foutines for speed, but I will only present a general method here in the interest of keeping things simple and easy to follow.  We start out with some generated data, and finish by fitting curves to a real data set.

We will also explore the possibilities of giving certain parameters limits in our fits.  Python has some great tools to help you with all your fitting needs!

For this tutorial, I am using:

  •  python v2.7
  •  numpy v1.5.1
  •  matplotlib v1.1.0
  •  scipy v0.8.0
  •  lmfit v0.4 – for constraining parameters
  •  pyfits 2.3.1 – for I/O of the .fits file

The basic method I will outline below:

  • Load or define the data array that you want to fit
  • Define a function for your desired equation
  • Fit with scipy.optimize.leastsq  (or with lmfit.minimize, if you want to constrain certain parameters)

Import the modules we will use

import numpy as np
from matplotlib import pyplot as plt
from scipy.optimize import leastsq
from lmfit import minimize, Parameters
import pyfits

Let’s start with a simple example: we’ll generate some data with random noise, and fit a quadratic curve. Then we’ll plot the data, the fit, and residuals.

Creating the example data.  X-range from -3 to 10 in increments of 0.2.  Our y-data will start out as x2


Now add gaussian random noise to each element.


There is a cheap and easy way to perform fits with little setup if you are only interested in polynomials.  scipy.polyfit() allows you to give an x-array and a y-array as input, with the order of polynomial to fit, then spits out an array of the fit coefficients with little fuss.  To fit a quadratic to our data generated above, for example:

from scipy import polyfit


print fitcoeffs
# --> Returns array (<coeff. order 2>, <coeff. order 1>, <coeff. order 0>)

If we want to fit an arbitrary expression, though, we must define a python function which will compute our desired equation.  Our function will take two arrays as inputs: the list of coefficiencts (the parameters that we will fit – I will label them “p”), and the array of x-axis data points. Later we will define the initial guesses for the parameters, called “p0”.Our function will take the form     $\em y = a x ^2 + b x + c \em$

def quadratic(p,x):
    return y_out

There is a simpler way to do this for short/simple expressions: python’s lambda functions are essentially one-line function definitions.

quadratic = lambda p,x: p[0]*(x**2)+p[1]*x+p[2]

We will also need to define a function of the difference between the fitted function with the current iteration’s parameters and the original data.

quadraticerr = lambda p,x,y: quadratic(p,x)-y

Define our initial guesses of the quadratic’s coefficients (I’m making bad guesses here on purpose, just to show that it still works)


Now for the actual fitting.  We will tell scipy.optimize.leastsq to minimize the DIFFERENCE between the original data and iterated set of data given by the defined equation with the current coefficients.  So we tell it to perform a least-squares fit on “quadraticerr”, and give it the x- and y-data arrays as arguments.

#covar=out[1] #This is the covariance matrix output
print('Fitted Parameters:\na = %.2f , b = %.2f , c = %.2f' % (paramsout[0],paramsout[1],paramsout[2]))

Now let’s plot our results. I will create a smoother curve for the fit on the fly within pyplot.  I will also add a text box with the fit parameters and plot the residuals just below the data.

    #xstart, ystart, xwidth, yheight --> units are fraction of the image from bottom left

Plot the data in frame1

frame1.set_xticklabels([]) #We will plot the residuals below, so no x-ticks on this plot
plt.title('Quadratic Fit Example')
frame1.annotate('$y$ = %.2f$\cdot x^2$+%.2f$\cdot x$+%.2f'%(paramsout[0],paramsout[1],paramsout[2]), \
                xy=(.05,.95),xycoords='axes fraction',ha="left",va="top",bbox=dict(boxstyle="round", fc='1'))

If we want to keep the plot tidy:

from matplotlib.ticker import MaxNLocator
plt.gca().yaxis.set_major_locator(MaxNLocator(prune='lower')) #Removes lowest ytick label

Now plot the residuals in frame2





Here is the result:



Now we will set limits for what values the coefficients can be fit to. This is particularly useful if you know (or want to impose) on the physics of the phenomenon represented by the data. scipy.optimize.minimize() has this functionality, and I will probably add an example using this in the future.  Another useful module is lmfit, which I will use from here on out.  Here we will fit a power law to data for the M-Sigma relation (data from Gultekin et al. 2009 – )

M=dat[:,0]      #The y-data
Sigma=dat[:,1]  #The x-data

The general (simple) form for a power law is    $\em y = a \cdot  x^b \em$ .  We can do this simply enough directly, or we can be clever and note that we could fit a line for    $\em log( y ) = log( a ) + log( x^b ) = \alpha + \beta \cdot log( x ) \em$ .

lmfit requires us to define the coefficients as Parameters() objects. We then define their initial guesses, whether or not they can vary, and what their min/max values can be.  Finally, when we define our equation functions, we must call the parameters with p0[‘par_x’].value :

params.add('beta',value=4,vary=True,max=5.0) #,min=...

power=lambda p,x: params['alpha'].value+params['beta'].value*x
powererr=lambda p,x,y: power(p,x)-y

Perform the fit with lmfit.minimize.  This output gives us a whole suite of info, such as reduced chi-square values.  With lmfit, the parameters will be updated within the Parameters() object


print('Fitted Parameters:\nalpha = %.2f , beta = %.2f' % (params['alpha'].value,params['beta'].value))


Plot the data and the fit


plt.xlabel('$\sigma$ (km s$^{-1}$)')
plt.ylabel('M (M$_{\odot}$)')
plt.title('Power Law Fit Example')
sp1.annotate('log($y$) = %.2f+%.2f$\cdot $log($x$)\n$y$ = %.2f$\cdot x^{%.2f}$' \

%(params['alpha'].value,params['beta'].value,10**params['alpha'].value,params['beta'].value),xy=(.05,.95), \
xycoords='axes fraction',ha="left", va="top",bbox=dict(boxstyle="round", fc='1'))



Here is the result:


Now is a good time to mention that lmfit.minimize() gives you some freebie goodness-of-fit diagnostics:

nfev number of function evaluations
success boolean (True/False) for whether fit succeeded.
errorbars boolean (True/False) for whether uncertainties were estimated.
message message about fit success.
ier integer error value from scipy.optimize.leastsq
lmdif_message message from scipy.optimize.leastsq
nvarys number of variables in fit    $\em N_\mathrm{varys}\em$
ndata number of data points:    $\em N \em$
nfree degrees of freedom in fit:    $\em N - N_\mathrm{varys}\em$
residual residual array (return of func()   $\em \ \mathrm{Resid} \em$
chisqr chi-square:    $\em \chi^2 = \Sigma_i^N \left[ \mathrm{Resid}_i \right] \em$
redchi reduced chi-square:    $\em \chi^2_\nu = \chi^2 / \left(  N - N_\mathrm{varys} \right)\em$

Call with, e.g., output.chisqr or output.residual

Again, see for details.


Next, let’s read in data from a .fits file and perform a fit.  This particular  data cube is a neutral hydrogen VLA observation of dwarf galaxy WLM.  The cube  can be downloaded (~430MB) from  –> I have reproduced the ‘stackspec’ and ‘vels’ arrays at the end of this tutorial. Specifically, I will sum up all of the x-y data to make one big stacked z-array, or ‘Super Profile’.  Then I will fit various gaussian functions (Gaus + Hermite  polynomials and double gaussian) and plot them with their residuals.  Of course,  in real life you would want to ‘Shuffle’ the spectra and account for other factors, but this is just an example.

I am using the default Marquardt-Levenberg algorithm.  Note that fitting results will depend quite a bit on what you give as initial guesses – ML finds LOCAL  extrema quite well, but it doesn’t necessarily find the global extrema.  In short, do your best to provide a good first guess to the fit parameters.

Load in the data. We don’t care about the 4th axis, so cut it out. Also get the header and specify the velocity parameters.


cdelt3=cubehdr['CDELT3']/1000.; crval3=cubehdr['CRVAL3']/1000.; crpix3=cubehdr['CRPIX3'];
minvel=crval3+(-crpix3+1)*cdelt3; maxvel=crval3+(cube.shape[0]-crpix3)*cdelt3

#minvel=20.452084230999986; maxvel=-252.452084231
#cdelt3=-2.574567627; chanwidth=2.574567627

Stack all the spectra



The velocity array in the cube goes from positive to negative, so let’s reverse it to make the fitting go smoother.

vels=vels[::-1]; stackspec=stackspec[::-1]

Alternatively, to skip using pyfits and load the velocity and spectrum data from the .csv file provided below:


Next, we define our fit functions:

Gaussian plus Hermite Polynomials H3/H4   (labeled below as ~_gh)
   $\em f(x) = A  \; e^{-\frac{g^2}{2}} \left[ 1 + h_3 \left( -\sqrt{3} \; g + \frac{2}{\sqrt{3}} \; g^3 \right) + h_4 \left( \frac{\sqrt{6}}{4} - \sqrt{6} \; g^2 + \frac{\sqrt{6}}{3} \; g^4 \right) \right] \em$ , where    $\em g=(x-x_c)/\sigma} \em$


def gaussfunc_gh(paramsin,x):
    c1=-np.sqrt(3); c2=-np.sqrt(6); c3=2/np.sqrt(3); c4=np.sqrt(6)/3; c5=np.sqrt(6)/4

    return gaustot_gh

Double Gaussian   (labeled below as ~_2g)
   $\em F(x) =  \sum\limits_{i}  f_i (x) = A_1 \;  e^{-\frac{1}{2} (\frac{x-x_{c,1}}{\sigma_1})^2} +  A_2 \;  e^{-\frac{1}{2} (\frac{x-x_{c,2}}{\sigma_2})^2} + \ldots \em$
Bounds–> amp: 10% of max to max   center: velocity range   disp: channel width to velocity range


def gaussfunc_2g(paramsin,x):
    amp1=paramsin['amp1'].value; amp2=paramsin['amp2'].value;
    center1=paramsin['center1'].value; center2=paramsin['center2'].value;
    sig1=paramsin['sig1'].value; sig2=paramsin['sig2'].value;

    return gaustot_2g

And now the functions that compute the difference between the fit iteration and data. In addition, define a function for a simple single gaussian.

gausserr_gh = lambda p,x,y: gaussfunc_gh(p,x)-y
gausserr_2g = lambda p,x,y: gaussfunc_2g(p,x)-y
gausssingle = lambda a,c,sig,x: a*np.exp(-.5*((x-c)/sig)**2)

We will minimize with lmfit again, in order to keep limits on parameters


Let’s make some shorthand definitions to keep things tidy


If you want to use the simple fitter, without defining bounds, revert  to scipy.optimize.leastsq().  Note, however, that with complex functions such as two gaussians, you run the risk of getting nonphysical results, for example one gaussian being negative.  That said, here is the code:


def gaussfunc_gh(p,x):
    c1=-np.sqrt(3); c2=-np.sqrt(6); c3=2/np.sqrt(3); c4=np.sqrt(6)/3; c5=np.sqrt(6)/4
    return p[0]*np.exp(-.5*((x-p[1])/p[2])**2)*(1+p[3]*(c1*((x-p[1])/p[2])+c3*((x-p[1])/p[2])**3)+p[4]*(c5+c2*((x-p[1])/p[2])**2+c4*((x-p[1])/p[2])**4))

def gaussfunc_2g(p,x):
    return gaus1+gaus2


print('Fitted Parameters (Gaus+Hermite):\nAmp = %.2f , Center = %.2f , Disp = %.2f\nSkew = %.2f , Kurt = %.2f' % (pars_gh[0],pars_gh[1],pars_gh[2],pars_gh[3],pars_gh[4]))
print('Fitted Parameters (Double Gaussian):\nAmp1 = %.2f , Center1 = %.2f , Sig1 = %.2f\nAmp2 = %.2f , Center2 = %.2f , Sig2 = %.2f' % (pars_2g[0],pars_2g[1],pars_2g[2],pars_2g[3],pars_2g[4],pars_2g[5]))

[End aside]


Print the fitted parameters

print('Fitted Parameters (Gaus+Hermite):\nAmp = %.2f , Center = %.2f , Disp = %.2f\nSkew = %.2f , Kurt = %.2f' \

print('Fitted Parameters (Double Gaussian):\nAmp1 = %.2f , Center1 = %.2f , Sig1 = %.2f\nAmp2 = %.2f , Center2 = %.2f , Sig2 = %.2f' \

Now let’s plot the results

    #xstart, ystart, xwidth, yheight --> units are fraction of the image from bottom left

Plot the data in frame1

f1.set_xticklabels([]) #We will plot the residuals below, so no x-ticks on this plot
plt.title('Multiple Gaussian Fit Example')
plt.ylabel('Amplitude (Some Units)')
f1.legend([pgh,p2g,p2ga,p2gb],['Gaus-Hermite','2-Gaus','Comp. 1','Comp2'],prop={'size':10},loc='center left')

from matplotlib.ticker import MaxNLocator
plt.gca().yaxis.set_major_locator(MaxNLocator(prune='lower')) #Removes lowest ytick label

Let’s also put in some text boxes with the fit results

f1.annotate('Gauss-Hermite:\nAmp = %.2f\nCenter = %.2f\n$\sigma$ = %.2f\nH3 = %.2f\nH4 = %.2f' \
    %(pars_gh[0],pars_gh[1],pars_gh[2],pars_gh[3],pars_gh[4]),xy=(.05,.95), \
    xycoords='axes fraction',ha="left", va="top", \
bbox=dict(boxstyle="round", fc='1'),fontsize=10)
f1.annotate('Double Gaussian:\nAmp$_1$ = %.2f\nAmp$_2$ = %.2f\nCenter$_1$ = %.2f\nCenter$_2$ = %.2f\n$\sigma_1$ = %.2f\n$\sigma_2$ = %.2f' \
    %(pars_2g[0],pars_2g[3],pars_2g[1],pars_2g[4],pars_2g[2],pars_2g[5]),xy=(.95,.95), \
    xycoords='axes fraction',ha="right", va="top", \
    bbox=dict(boxstyle="round", fc='1'),fontsize=10)

Now plot the residuals in frame2



plt.xlabel('Velocity (km s$^{-1}$)')
f2.legend([resgh,res2g],['Gaus-Hermite','2-Gaus'],numpoints=4,prop={'size':9},loc='upper left')



Finally, here is the result:



Here is a condensed version of the commands.

Condensed Code

import numpy as np
from matplotlib import pyplot as plt
from scipy.optimize import leastsq
from lmfit import minimize, Parameters
import pyfits

### Example 1: Quadratic fitting with scipy.optimize.leastsq() ###

quadratic = lambda p,x: p[0]*(x**2)+p[1]*x+p[2]
quadraticerr = lambda p,x,y: quadratic(p,x)-y

p0=[2,2,2] #Initial guesses

print('Fitted Parameters:\na = %.2f , b = %.2f , c = %.2f' % (paramsout[0],paramsout[1],paramsout[2]))

frame1=fig1.add_axes((.1,.3,.8,.6)) #xstart, ystart, xwidth, yheight --> units are fraction of the image from bottom left

frame1.set_xticklabels([]) #We will plot the residuals below, so no x-ticks on this plot
plt.title('Quadratic Fit Example')
frame1.annotate('$y$ = %.2f$\cdot x^2$+%.2f$\cdot x$+%.2f' \
%(paramsout[0],paramsout[1],paramsout[2]),xy=(.05,.95), \
xycoords='axes fraction',ha="left", va="top", \
bbox=dict(boxstyle="round", fc='1'))

from matplotlib.ticker import MaxNLocator
plt.gca().yaxis.set_major_locator(MaxNLocator(prune='lower')) #Removes lowest ytick label



### Example 2: power law with lmfit.minimize() ###


params.add('beta',value=4,vary=True,max=5.0) #,min=...

power=lambda p,x: params['alpha'].value+params['beta'].value*x
powererr=lambda p,x,y: power(p,x)-y

print('Fitted Parameters:\nalpha = %.2f , beta = %.2f' % (params['alpha'].value,params['beta'].value))


# Plot the data and the fit
plt.xlabel('$\sigma$ (km s$^{-1}$)')
plt.ylabel('M (M$_{\odot}$)')
plt.title('Power Law Fit Example')
sp1.annotate('log($y$) = %.2f+%.2f$\cdot $log($x$)\n$y$ = %.2f$\cdot x^{%.2f}$' \
%(params['alpha'].value,params['beta'].value,10**params['alpha'].value,params['beta'].value),xy=(.05,.95), \
xycoords='axes fraction',ha="left", va="top", \
bbox=dict(boxstyle="round", fc='1'))


### Example 3: multiple Gaussians from data cube with parameter limits ###


#cdelt3=cubehdr['CDELT3']/1000.; crval3=cubehdr['CRVAL3']/1000.; crpix3=cubehdr['CRPIX3'];
#minvel=crval3+(-crpix3+1)*cdelt3; maxvel=crval3+(cube.shape[0]-crpix3)*cdelt3

#stackspec=np.sum(np.sum(cube,axis=2),axis=1) # Stack all the spectra
#vels=vels[::-1]; stackspec=stackspec[::-1] #Reversing the arrays to go from - to +


# Gaussian plus Hermite Polynomials H3/H4   (labeled below as ..._gh)

def gaussfunc_gh(paramsin,x):
 c1=-np.sqrt(3); c2=-np.sqrt(6); c3=2/np.sqrt(3); c4=np.sqrt(6)/3; c5=np.sqrt(6)/4
 return amp*np.exp(-.5*((x-center)/sig)**2)*(1+skew*(c1*((x-center)/sig)+c3*((x-center)/sig)**3)+kurt*(c5+c2*((x-center)/sig)**2+c4*((x-center)/sig)**4))

# Double Gaussian   (labeled below as ..._2g)

def gaussfunc_2g(paramsin,x):
 amp1=paramsin['amp1'].value; amp2=paramsin['amp2'].value;
 center1=paramsin['center1'].value; center2=paramsin['center2'].value;
 sig1=paramsin['sig1'].value; sig2=paramsin['sig2'].value;
 return (gaus1+gaus2)

gausserr_gh = lambda p,x,y: gaussfunc_gh(p,x)-y
gausserr_2g = lambda p,x,y: gaussfunc_2g(p,x)-y
gausssingle = lambda a,c,sig,x: a*np.exp(-.5*((x-c)/sig)**2)



print('Fitted Parameters (Gaus+Hermite):\nAmp = %.2f , Center = %.2f , Disp = %.2f\nSkew = %.2f , Kurt = %.2f' \
print('Fitted Parameters (Double Gaussian):\nAmp1 = %.2f , Center1 = %.2f , Sig1 = %.2f\nAmp2 = %.2f , Center2 = %.2f , Sig2 = %.2f' \

f1.set_xticklabels([]) #We will plot the residuals below, so no x-ticks on this plot
plt.title('Multiple Gaussian Fit Example')
plt.ylabel('Amplitude (Some Units)')
f1.legend([pgh,p2g,p2ga,p2gb],['Gaus-Hermite','2-Gaus','Comp. 1','Comp2'],prop={'size':10},loc='center left')

from matplotlib.ticker import MaxNLocator
plt.gca().yaxis.set_major_locator(MaxNLocator(prune='lower')) #Removes lowest ytick label

f1.annotate('Gauss-Hermite:\nAmp = %.2f\nCenter = %.2f\n$\sigma$ = %.2f\nH3 = %.2f\nH4 = %.2f'%(pars_gh[0],pars_gh[1],pars_gh[2],pars_gh[3],pars_gh[4]),xy=(.05,.95),xycoords='axes fraction',ha="left", va="top",bbox=dict(boxstyle="round", fc='1'),fontsize=10)
f1.annotate('Double Gaussian:\nAmp$_1$ = %.2f\nAmp$_2$ = %.2f\nCenter$_1$ = %.2f\nCenter$_2$ = %.2f\n$\sigma_1$ = %.2f\n$\sigma_2$ = %.2f'%(pars_2g[0],pars_2g[3],pars_2g[1],pars_2g[4],pars_2g[2],pars_2g[5]),xy=(.95,.95),xycoords='axes fraction',ha="right", va="top",bbox=dict(boxstyle="round", fc='1'),fontsize=10)

plt.xlabel('Velocity (km s$^{-1}$)')
f2.legend([resgh,res2g],['Gaus-Hermite','2-Gaus'],numpoints=4,prop={'size':9},loc='upper left')

Here is the sample data for the Gaussian fitting, for those who opt not to download the data cube.  Formatted velocities on the left, stacked spectra values on the right.

Click here for GausFitData.csv
-252.452084231    7.0471
-249.877516604    5.41091
-247.302948977    -0.0805578
-244.72838135    9.80676
-242.153813723    -2.64569
-239.579246096    3.44144
-237.004678469    -0.797753
-234.430110842    -2.29659
-231.855543215    -0.593432
-229.280975588    5.25571
-226.706407961    8.75715
-224.131840334    9.0627
-221.557272707    -6.08199
-218.98270508    -3.14464
-216.408137453    -2.99775
-213.833569826    -4.07002
-211.259002199    -2.21975
-208.684434572    -2.19285
-206.109866945    -3.57909
-203.535299318    -1.38921
-200.960731691    -2.03753
-198.386164064    -9.93365
-195.811596437    -1.14918
-193.23702881    3.11781
-190.662461183    -1.26111
-188.087893556    -7.38781
-185.513325929    2.30879
-182.938758302    0.976938
-180.364190675    -4.54513
-177.789623048    -2.34182
-175.215055421    0.835287
-172.640487794    0.578699
-170.065920167    7.0345
-167.49135254    9.30069
-164.916784913    13.3074
-162.342217286    21.4498
-159.767649659    39.6186
-157.193082032    61.6891
-154.618514405    71.2766
-152.043946778    84.5251
-149.469379151    92.3035
-146.894811524    103.288
-144.320243897    103.722
-141.74567627    105.536
-139.171108643    110.039
-136.596541016    119.542
-134.021973389    112.635
-131.447405762    111.95
-128.872838135    133.203
-126.298270508    135.486
-123.723702881    145.488
-121.149135254    156.818
-118.574567627    176.232
-116    190.496
-113.425432373    198.189
-110.850864746    195.464
-108.276297119    184.057
-105.701729492    170.15
-103.127161865    149.234
-100.552594238    118.705
-97.978026611    101.411
-95.403458984    92.2328
-92.828891357    73.3222
-90.25432373    67.2127
-87.679756103    41.8572
-85.105188476    37.7619
-82.530620849    20.5096
-79.956053222    13.5301
-77.381485595    15.9504
-74.806917968    13.5154
-72.232350341    5.63028
-69.657782714    3.38025
-67.083215087    -2.62706
-64.50864746    2.74693
-61.934079833    1.86001
-59.359512206    1.59868
-56.784944579    2.25844
-54.210376952    -0.369037
-51.635809325    0.334506
-49.061241698    -0.196296
-46.486674071    -1.56728
-43.912106444    -1.1896
-41.337538817    1.94682
-38.76297119    0.268496
-36.188403563    7.77246
-33.613835936    -5.04572
-31.039268309    -4.57877
-28.464700682    5.39095
-25.890133055    -0.00106338
-23.315565428    4.24355
-20.740997801    -2.42964
-18.166430174    -3.30451
-15.591862547    -0.803972
-13.01729492    -8.57761
-10.442727293    -14.549
-7.868159666    -16.3109
-5.293592039    10.3016
-2.719024412    16.9872
-0.144456785    14.6043
2.430110842    2.62755
5.004678469    -0.388005
7.579246096    -10.0996
10.153813723    -3.00706
12.72838135    2.11268
15.302948977    5.23242
17.877516604    4.40532
20.452084231    -1.07435