Creating Excel like interactive Pivot Tables in IPython Notebook using Pandas


This post will focus on using the new widget and interact functionality of IPython 2 to generate Excel like interactive pivot tables. (Well almost you will get the idea)

One idea is to write a plugin that you can call on a file or dataframe with a magic command, and then an interactive Excel Like Pivot Table is produced.

pivots

Lets Create a Dataframe

The code below creates a pandas dataframe that we can use to pivot out some data. You will have to download the notebook and run it to see the final result as NBConvert cannot render it and it has to run against an open kernel. You can download the notebook at here

In [8]:
from pandas import Series, DataFrame, pivot_table
import numpy as np
import numpy

d = { 'Class'   : Series( ['a',  'b', 'b','a','a',  'b', 'b','a','a',  'b', 'b','a','a','b','b','b']),
      'Area'   : Series( ['North','East', 'South', 'West','North','East', 'South', 'West','North','East', 'South', 'West','South', 'West','South', 'West']),
      'Type' : Series( ['square', 'round','square', 'round', 'round', 'square', 'round', 'square', 'round', 'square','round', 'square',]),
      'Web'  : Series( ['Y','N','N','Y','Y','N','N','Y','Y','N','N','Y','Y','N','N','Y']),
      'Agent'   : Series( ['Mike',  'John', 'Pete','Mike',  'John', 'Pete','Mike',  'John', 'Pete','Mike',  'John', 'Pete','John', 'Pete','John', 'Pete']),
      'Income'   : Series( [20., 40., 90., 20.]),
      'Profit' : Series( [1., 2., 3., 4.,1., 2., 3., 4.,1., 2., 3., 4.,1., 2., 3., 4.]),
      'Stock' : Series( [20., 23., 33., 43.,12., 21., 310., 41.,11., 21., 31., 41.,11., 22., 34., 54.] )
     }
df = DataFrame(d)
print df
   Agent   Area Class  Income  Profit  Stock    Type Web
0   Mike  North     a      20       1     20  square   Y
1   John   East     b      40       2     23   round   N
2   Pete  South     b      90       3     33  square   N
3   Mike   West     a      20       4     43   round   Y
4   John  North     a     NaN       1     12   round   Y
5   Pete   East     b     NaN       2     21  square   N
6   Mike  South     b     NaN       3    310   round   N
7   John   West     a     NaN       4     41  square   Y
8   Pete  North     a     NaN       1     11   round   Y
9   Mike   East     b     NaN       2     21  square   N
10  John  South     b     NaN       3     31   round   N
11  Pete   West     a     NaN       4     41  square   Y
12  John  South     a     NaN       1     11     NaN   Y
13  Pete   West     b     NaN       2     22     NaN   N
14  John  South     b     NaN       3     34     NaN   N
15  Pete   West     b     NaN       4     54     NaN   Y

[16 rows x 8 columns]

Now lets just create a nice pivot to see if everything is working. We call the sum function on the values for 'Profit' and 'Stock' for the 'Area' and 'Class' rows.

In [9]:
pivot_table( df, rows=['Area','Class'], values = ['Profit','Stock'], aggfunc=numpy.sum)
Out[9]:
Profit Stock
Area Class
East b 6 65
North a 3 43
South a 1 11
b 12 408
West a 12 125
b 6 76

6 rows × 2 columns

To make widgets to interact with we need to first create a function that would return 'visualy' to the workbook what we want. I included a plot for fun. We should be able to pass a matplotlib function like bar to it using the widget aswell. I struggled to pass the dataframe 'df' to the function so hardcoded it for now.

In [10]:
%pylab inline
def my_pivot( rows, values, aggfunc, plt=False):
    dfp = df
    piv = pivot_table( dfp, rows=rows, values=values, aggfunc=aggfunc)
    print piv
    if plt: plot( piv )
Populating the interactive namespace from numpy and matplotlib

Lets just test this function to see if it works.

In [11]:
my_pivot(rows=['Area','Class'], values = ['Profit','Stock'], aggfunc=numpy.sum, plt=True)
             Profit  Stock
Area  Class               
East  b           6     65
North a           3     43
South a           1     11
      b          12    408
West  a          12    125
      b           6     76

[6 rows x 2 columns]

Lets import some goodies to enable us to use the interactive widget. When you run this code for the first time you will see the selectors and then some error. This is because I did not set the default selector values. Select a column with only text values and then for 'values' a column like 'Income', 'Profit' or 'Stock'. You can also select if you want to sum or average them out. There is also a radio button to show or hide the plot.

In [12]:
from IPython.html.widgets import interact, SelectWidget, CheckboxWidget, RadioButtonsWidget
from IPython.display import display
In [13]:
i = interact( my_pivot,
                 rows    = SelectWidget(values=list(df.columns)), 
                 values  = SelectWidget(values=list(df.columns)),
                 aggfunc = SelectWidget( values={ 'sum' : numpy.sum, 'ave' : numpy.average }),
                 plt     = RadioButtonsWidget( values = { 'Plot': True, 'False' : False } )
             )
Class
a        16
b        24
Name: Profit, dtype: float64


comments powered by Disqus