pandas数据框获得每组的第一行

我有一个像下面的pandasDataFrame

 df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7], 'value' : ["first","second","second","first", "second","first","third","fourth", "fifth","second","fifth","first", "first","second","third","fourth","fifth"]}) 

我想通过[“id”,“value”]将其分组,并得到每个组的第一行。

  id value 0 1 first 1 1 second 2 1 second 3 2 first 4 2 second 5 3 first 6 3 third 7 3 fourth 8 3 fifth 9 4 second 10 4 fifth 11 5 first 12 6 first 13 6 second 14 6 third 15 7 fourth 16 7 fifth 

预期结果

  id value 1 first 2 first 3 first 4 second 5 first 6 first 7 fourth 

我试过以下只给出了DataFrame的第一行。 任何有关这个帮助表示赞赏。

 In [25]: for index, row in df.iterrows(): ....: df2 = pd.DataFrame(df.groupby(['id','value']).reset_index().ix[0]) 
 >>> df.groupby('id').first() value id 1 first 2 first 3 first 4 second 5 first 6 first 7 fourth 

如果你需要id作为列:

 >>> df.groupby('id').first().reset_index() id value 0 1 first 1 2 first 2 3 first 3 4 second 4 5 first 5 6 first 6 7 fourth 

要获得n个第一个logging,可以使用head():

 >>> df.groupby('id').head(2).reset_index(drop=True) id value 0 1 first 1 1 second 2 2 first 3 2 second 4 3 first 5 3 third 6 4 second 7 4 fifth 8 5 first 9 6 first 10 6 second 11 7 fourth 12 7 fifth 

这将给你每个组的第二行(零索引,nth(0)是相同的第一个()):

 df.groupby('id').nth(1) 

文档: http : //pandas.pydata.org/pandas-docs/stable/groupby.html#taking-the-nth-row-of-each-group

也许这是你想要的

 import pandas as pd idx = pd.MultiIndex.from_product([['state1','state2'], ['county1','county2','county3','county4']]) df = pd.DataFrame({'pop': [12,15,65,42,78,67,55,31]}, index=idx) 
  pop state1 county1 12 county2 15 county3 65 county4 42 state2 county1 78 county2 67 county3 55 county4 31 
 df.groupby(level=0, group_keys=False).apply(lambda x: x.sort_values('pop', ascending=False)).groupby(level=0).head(3) > Out[29]: pop state1 county3 65 county4 42 county2 15 state2 county1 78 county2 67 county3 55