Tag: pandas

获取垂直网格线以显示在matplotlib的线图中: 我想要在我的图上获得水平和垂直网格线，但只有水平网格线默认显示。我正在使用pandas.DataFrame的一个SQL查询中的一个pandas.DataFrame来生成一个date在X轴上的线条图。我不知道为什么他们不出现在date，我试图寻找一个答案，但找不到一个。我所有用来绘制graphics的代码都是下面的简单代码。 data.plot() grid('on') data是包含sql查询的date和数据的DataFrame。我也尝试添加下面的代码，但我仍然得到相同的输出没有垂直网格线。 ax = plt.axes() ax.yaxis.grid() # horizontal lines ax.xaxis.grid() # vertical lines 有什么build议么？

Python Pandas所选列的最大值: data = {'name' : ['bill', 'joe', 'steve'], 'test1' : [85, 75, 85], 'test2' : [35, 45, 83], 'test3' : [51, 61, 45]} frame = pd.DataFrame(data) 我想添加一个新的列，显示每行的最大值。所需的输出： name test1 test2 test3 HighScore bill 75 75 85 85 joe 35 45 83 83 steve 51 61 45 61 有时 frame['HighScore'] = max(data['test1'], data['test2'], data['test3']) 但大部分时间都会出现这个错误： ValueError：具有多个元素的数组的真值是不明确的。 […]

如何将一个pandas DataFrame列和行的子集转换成一个numpy数组？: 我想知道是否有一种更简单，高效的内存方法来从pandas数据框中select行和列的子集。例如，给定这个数据框： df = DataFrame（np.random.rand（4,5），columns = list（'abcde'））打印df ABCDE 0 0.945686 0.000710 0.909158 0.892892 0.326670 1 0.919359 0.667057 0.462478 0.008204 0.473096 2 0.976163 0.621712 0.208423 0.980471 0.048334 3 0.459039 0.788318 0.309892 0.100539 0.753992 我只想要列'c'的值大于0.5的那些行，但是我只需要那些行的列'b'和'e'。这是我提出的方法 – 也许有一个更好的“pandas”方式？ locs = [[a'，'d']]中的[df.columns.get_loc（_）] 打印df [df.c> 0.5] [locs] 广告 0 0.945686 0.892892 我的最终目标是将结果转换为一个numpy数组传递给sklearn回归algorithm，所以我将使用上面的代码： training_set = array（df [df.c> 0.5] [locs]） […]

如何在pandas的两列中形成元组列: 我有一个pandas数据框，我想结合“拉”和“长”列形成一个元组。 <class 'pandas.core.frame.DataFrame'> Int64Index: 205482 entries, 0 to 209018 Data columns: Month 205482 non-null values Reported by 205482 non-null values Falls within 205482 non-null values Easting 205482 non-null values Northing 205482 non-null values Location 205482 non-null values Crime type 205482 non-null values long 205482 non-null values lat 205482 non-null values dtypes: float64(4), object(5) 我试图使用的代码是： def […]

带有NaN（缺失）值的groupby列: 我有一个DataFrame的列中有许多缺less的值，我希望groupby： import pandas as pd import numpy as np df = pd.DataFrame({'a': ['1', '2', '3'], 'b': ['4', np.NaN, '6']}) In [4]: df.groupby('b').groups Out[4]: {'4': [0], '6': [2]} 看到Pandas已经删除了NaN目标值的行。（我想包括这些行！）因为我需要很多这样的操作（许多cols缺less值），并且使用比中位数（通常是随机森林）更复杂的函数，所以我想避免编写太复杂的代码段。有什么build议么？我应该为此写一个函数还是有一个简单的解决scheme？

大pandas按组合和列进行sorting: 给定以下dataframe In [31]: rand = np.random.RandomState(1) df = pd.DataFrame({'A': ['foo', 'bar', 'baz'] * 2, 'B': rand.randn(6), 'C': rand.rand(6) > .5}) In [32]: df Out[32]: ABC 0 foo 1.624345 False 1 bar -0.611756 True 2 baz -0.528172 False 3 foo -1.072969 True 4 bar 0.865408 False 5 baz -2.301539 True 我想按组合（ A ）按B总和，然后按C （不汇总）的值sorting。所以基本上得到了A组的顺序 In […]

在pandas如何将一个datestring的string转换为date时间对象，并把它们放在一个DataFrame？: import pandas as pd date_stngs = ('2008-12-20','2008-12-21','2008-12-22','2008-12-23') a = pd.Series(range(4),index = (range(4))) for idx, date in enumerate(date_stngs): a[idx]= pd.to_datetime(date) 这个码位产生错误： TypeError：“'int'对象不可迭代” 任何人都可以告诉我如何将这一系列的date时间string作为DateTime对象到DataFrame？

如何将Seaborn情节保存到文件中: 我尝试了下面的代码（ test_seaborn.py ）： import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt matplotlib.style.use('ggplot') import seaborn as sns sns.set() df = sns.load_dataset('iris') sns_plot = sns.pairplot(df, hue='species', size=2.5) fig = sns_plot.get_figure() fig.savefig("output.png") #sns.plt.show() 但是我得到这个错误： Traceback (most recent call last): File "test_searborn.py", line 11, in <module> fig = sns_plot.get_figure() AttributeError: 'PairGrid' object has no attribute 'get_figure' 我期望最终output.png将存在，看起来像这样：我该如何解决这个问题？

pandas：结合string和int列: 我有一个下面的DataFrame ： from pandas import * df = DataFrame({'foo':['a','b','c'], 'bar':[1, 2, 3]}) 它看起来像这样： bar foo 0 1 a 1 2 b 2 3 c 现在我想有这样的东西： bar 0 1 is a 1 2 is b 2 3 is c 我怎样才能做到这一点？我尝试了以下内容： df['foo'] = '%s is %s' % (df['bar'], df['foo']) 但它给了我一个错误的结果： >>>print df.ix[0] bar a foo 0 […]