hierarchical indexing是pandas的一项重要功能，它使你能在一个轴上拥有多个（两个以上）索引级别。它能让你以低维度形式处理高维度数据。

例：创建一个Series，并用一个由列表或数组组成的列表作为索引：

In [80]: data = Series(np.random.randn(10), index=[['a','a','a','b','b','b','c','c','d','d’], [1,2,3,1,2,3,1,2,2,3]])

In [81]: data
Out[81]: 
a  1    0.659838
   2   -0.530732
   3    0.862788
b  1   -0.678278
   2    1.189240
   3    0.167645
c  1    1.740861
   2    0.144723
d  2    1.153850
   3    0.686639
dtype: float64

这就是带有MultiIndex索引的Series的格式化输出形式；索引之间的“间隔”表示“直接使用上面的标签”：

In [82]: data.index
Out[82]: 
MultiIndex(levels=[['a', 'b', 'c', 'd'], [1, 2, 3]],
           labels=[[0, 0, 0, 1, 1, 1, 2, 2, 3, 3], [0, 1, 2, 0, 1, 2, 0, 1, 1, 2]])

对于一个层次化索引对象选取子集：

In [83]: data['b']
Out[83]: 
1   -0.678278
2    1.189240
3    0.167645
dtype: float64

In [84]: data['b':'d']
Out[84]: 
b  1   -0.678278
   2    1.189240
   3    0.167645
c  1    1.740861
   2    0.144723
d  2    1.153850
   3    0.686639
dtype: float64

有时甚至还可以在“内层”中进行选取：

In [85]: data[:,2]
Out[85]: 
a   -0.530732
b    1.189240
c    0.144723
d    1.153850
dtype: float64

层次化索引在数据重塑和基于分组的操作（如透视表生成）中扮演着重要的角色。

比如说，这段数据可以通过unstack方法被重新安排到一个DataFrame中：

In [86]: data.unstack()
Out[86]: 
          1         2         3
a  0.659838 -0.530732  0.862788
b -0.678278  1.189240  0.167645
c  1.740861  0.144723       NaN
d       NaN  1.153850  0.686639

unstack的逆运算是stack：

对于DataFrame，每条轴都可以有分层索引：

In [88]: frame = DataFrame(np.arange(12).reshape((4,3)), index=[['a','a','b','b'], [1,2,1,2]], columns=[['Ohio','Ohio','Colorado'],['Green','Red','Green']])

In [89]: frame
Out[89]: 
     Ohio     Colorado
    Green Red    Green
a 1     0   1        2
  2     3   4        5
b 1     6   7        8
  2     9  10       11

各层都可以有名字（可以是字符串，也可以是别的Python对象）。如果指定了名称，它们就会显示在控制台输出中（注意区别索引名称跟轴标签！）：

In [90]: frame.index.names = ['key1', 'key2']
In [93]: frame.columns.names = ['state', 'color']

In [94]: frame
Out[94]: 
state      Ohio     Colorado
color     Green Red    Green
key1 key2                   
a    1        0   1        2
     2        3   4        5
b    1        6   7        8
     2        9  10       11

由于有了分部的列索引，因此可以轻松选取列分组：

In [95]: frame['Ohio']
Out[95]: 
color      Green  Red
key1 key2            
a    1         0    1
     2         3    4
b    1         6    7
     2         9   10

可以单独创建MultiIndex然后复用：

columns = pd.MultiIndex.from_arrays([['Ohio','Ohio','Colorado'],['Green','Red','Green'] ...: ], names=['state','color'])

重新分级排序：

重新调整某条轴上各级别的顺序，或根据指定级别上的值对数据进行排序。

swaplevel接受两个编号或名称，并返回一个互换了级别的新对象：

In [102]: frame.swaplevel('key1', 'key2')
Out[102]: 
state      Ohio     Colorado
color     Green Red    Green
key2 key1                   
1    a        0   1        2
2    a        3   4        5
1    b        6   7        8
2    b        9  10       11

sort_index则根据单个级别中的值对数据进行排序（稳定的）：

In [103]: frame.sortlevel(1)
/Users/suhang/anaconda3/bin/ipython:1: FutureWarning: sortlevel is deprecated, use sort_index(level= ...)
  #!/Users/suhang/anaconda3/bin/python
Out[103]: 
state      Ohio     Colorado
color     Green Red    Green
key1 key2                   
a    1        0   1        2
b    1        6   7        8
a    2        3   4        5
b    2        9  10       11

In [104]: frame.swaplevel(0,1).sort_index(0)
Out[104]: 
state      Ohio     Colorado
color     Green Red    Green
key2 key1                   
1    a        0   1        2
     b        6   7        8
2    a        3   4        5
     b        9  10       11

根据级别汇总统计：

许多对DataFrame和Series的描述和汇总统计都有一个level选项，它用于指定在某条轴上求和的级别。

In [105]: frame
Out[105]: 
state      Ohio     Colorado
color     Green Red    Green
key1 key2                   
a    1        0   1        2
     2        3   4        5
b    1        6   7        8
     2        9  10       11

根据行或列上的级别进行求和：

In [107]: frame.sum(level='key2')
Out[107]: 
state  Ohio     Colorado
color Green Red    Green
key2                    
1         6   8       10
2        12  14       16

In [110]: frame.sum(level='color', axis=1)
Out[110]: 
color      Green  Red
key1 key2            
a    1         2    1
     2         8    4
b    1        14    7
     2        20   10

其实就是利用了pandas的groupby功能，稍后讲。

使用`DataFrame`的列：

人们经常想将DataFrame的一个或多个列当做行索引来用，或者将行索引变成DataFrame的列：

In [111]: frame = DataFrame({'a':range(7), 'b':range(7,0,-1), 'c':['one', 'one', 'one', 'two', 'two', 'two', 'two'], 'd':[0,1,2,0,1,2,3]})

In [112]: frame
Out[112]: 
   a  b    c  d
0  0  7  one  0
1  1  6  one  1
2  2  5  one  2
3  3  4  two  0
4  4  3  two  1
5  5  2  two  2
6  6  1  two  3

set_index函数会将其一个或多个列转换为行索引，并创建一个新的DataFrame：

In [113]: frame2 = frame.set_index(['c','d'])

In [114]: frame2
Out[114]: 
       a  b
c   d      
one 0  0  7
    1  1  6
    2  2  5
two 0  3  4
    1  4  3
    2  5  2
    3  6  1

默认情况下，那些列会被移除，也可以保留下来，drop=False：

In [115]: frame.set_index(['c','d'], drop=False)
Out[115]: 
       a  b    c  d
c   d              
one 0  0  7  one  0
    1  1  6  one  1
    2  2  5  one  2
two 0  3  4  two  0
    1  4  3  two  1
    2  5  2  two  2
    3  6  1  two  3

reset_index的功能跟set_index相反，层次化索引的级别会被移到列里面：

In [116]: frame2.reset_index()
Out[116]: 
     c  d  a  b
0  one  0  0  7
1  one  1  1  6
2  one  2  2  5
3  two  0  3  4
4  two  1  4  3
5  two  2  5  2
6  two  3  6  1

pandas入门（4）：层次化索引

pandas入门（4）：层次化索引

重新分级排序：

根据级别汇总统计：

使用`DataFrame`的列：

推荐阅读更多精彩内容

pandas入门（4）：层次化索引

重新分级排序：

根据级别汇总统计：

使用DataFrame的列：

推荐阅读更多精彩内容

使用`DataFrame`的列：