If you want to profile a Python application, you can get some basic functionality with the cProfile and pstats modules, as described in the Python Profilers documentation. After some curious experimenting, I discovered that the pstats module provides a handy interactive mode. To my surprising, I didn't found any substantial documentation about this time saver, so let me entertain you with a dump of my findings.

In the documentation you'll find a Instant User's Manual to get the impatient up an running in no time. E.g. to profile the entry point foo(), execute the following in your script/module:

import cProfile
cProfile.run('foo()', 'foo.profile')

This will profile the foo() function, and save the profiling data to a file foo.profile.

If you want to profile the whole application, you even don't have to edit your code, as you can run cProfile as a script with your application, say barbazzr.py, as argument:

python -m cProfile -o barbazzr.profile barbazzr.py

This will save the profiling data to barbazzr.profile.

Now, to examine the profile data (foo.profile or barbazzr.profile in the examples above), you can use the pstats module as described in the Instant User's Manual. But I've never liked typing these long chained function calls.

Much easier is using the interactive mode of the pstats module by running it as a script with the profile data file as argument:

python -m pstats barbazzr.profile

And voila, you are in an interactive profile statistics browsing session:

Welcome to the profile statistics browser.
%

Use the help command to get the supported commands:

% help

Documented commands (type help <topic>):
========================================
EOF  add  callees  callers  quit  read  reverse  sort  stats  strip

Undocumented commands:
======================
help

Get more help on a command, e.g. the stats command:

% help stats
Print statistics from the current stat object.
Arguments may be:
* An integer maximum number of entries to print.
* A decimal fractional number between 0 and 1, controlling
  what fraction of selected entries to print.
* A regular expression; only entries with function names
  that match it are printed.

Strip the long file paths to make the reports easier on the eyes:

% strip

Let's sort. Get the sorting options with the sort command (without arguments):

% sort
Valid sort keys (unique prefixes are accepted):
stdname -- standard name
nfl -- name/file/line
pcalls -- call count
file -- file name
calls -- call count
time -- internal time
line -- line number
cumulative -- cumulative time
module -- file name
name -- function name

Sort on internal time:

% sort time

And show the top 10 functions:

% stats 10
Thu Jan 21 17:54:33 2010    barbazrr.profile

         18195692 function calls (18084048 primitive calls) in 37.803 CPU seconds

   Ordered by: internal time
   List reduced from 979 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       22    7.644    0.347    7.645    0.347 arff.py:382(as_dict)
    19964    1.664    0.000    2.483    0.000 backend_ps.py:456(_convert_path)
   124176    1.541    0.000    2.307    0.000 arff.py:191(append)
       73    1.044    0.014    1.044    0.014 {method 'close' of 'file' objects}
   597349    0.978    0.000    1.475    0.000 labelhistogram.py:137(observe)
       22    0.967    0.044    2.275    0.103 labelhistogram.py:40(observe_from_arff)
  1368524    0.959    0.000    1.312    0.000 labelhistogram.py:21(observe)
       22    0.809    0.037    0.809    0.037 {built-in method write_png}
    39700    0.774    0.000    0.774    0.000 numeric.py:1500(identity)
    95736    0.719    0.000    0.719    0.000 {numpy.core.multiarray.array}

Sort on cumulative time and get the top 5:

% sort cumulative
% stats 5
Thu Jan 21 17:54:33 2010    barbazrr.profile

         18195692 function calls (18084048 primitive calls) in 37.803 CPU seconds

   Ordered by: cumulative time
   List reduced from 979 to 5 due to restriction <5>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.097    0.097   37.803   37.803 <string>:1(<module>)
        1    0.002    0.002   37.706   37.706 annotation_shifts.py:301(main)
       11    0.092    0.008   34.746    3.159 annotation_shifts.py:44(analyse)
       22    0.001    0.000   18.514    0.842 htmlreport.py:159(addPylabFigure)
       44    0.001    0.000   18.500    0.420 figure.py:927(savefig)

Very handy indeed.

Playing with the other commands or options are left as an exercise for the reader (I've always wanted to say this).

Some notes

  • Although the cProfile and pstats modules are listed on the Python Standard Library page, they were not installed by default on my setup (Ubuntu). A simple install of the python-profiler package did the trick.
  • Running your application with

    python -m cProfile -o barbazzr.profile barbazzr.py
    

    appears to mess with the __file__ variable: it evaluates to the file name of the cProfile module (/usr/lib/python2.6/cProfile.py in my case), instead of the file name of the script/module where __file__ variable is used. The workaround is to do the profiling with the first approach:

    import cProfile
    cProfile.run('foo()', 'foo.profile')