Tag: pandas

这是在pandas集团最快的方式吗？: 以下代码运行良好。只要检查一下：我是否正确使用pandas，是否有更快的方法？谢谢。 $ python3 Python 3.4.0 (default, Apr 11 2014, 13:05:11) [GCC 4.8.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import pandas as pd >>> import numpy as np >>> import timeit >>> pd.__version__ '0.14.1' def randChar(f, numGrp, N) : things = [f%x for x in range(numGrp)] return [things[x] […]

JSON到pandasDataFrame: 我想要做的是从纬度和经度坐标指定的path，从谷歌地图API提取高程数据，如下所示： from urllib2 import Request, urlopen import json path1 = '42.974049,-81.205203|42.974298,-81.195755' request=Request('http://maps.googleapis.com/maps/api/elevation/json?locations='+path1+'&sensor=false') response = urlopen(request) elevations = response.read() 这给了我一个这样的数据： elevations.splitlines() ['{', ' "results" : [', ' {', ' "elevation" : 243.3462677001953,', ' "location" : {', ' "lat" : 42.974049,', ' "lng" : -81.205203', ' },', ' "resolution" : 19.08790397644043', ' },', ' {', ' "elevation" […]

python dataframe pandas使用int来删除列: 我明白，要删除一个列，你使用df.drop（'列名'，轴= 1）。有没有办法使用数字索引而不是列名来删除列？

pandasDataFrame Groupby两列并获得数量: 我有一个pandas数据框在以下格式： df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T df.columns = ['col1','col2','col3','col4','col5'] DF： col1 col2 col3 col4 col5 0 1.1 A 1.1 x/y/z 1 1 1.1 A 1.7 x/y 3 2 1.1 A 2.5 x/y/z/n 3 3 2.6 B 2.6 x/u 2 4 2.5 B 3.3 x […]

你如何在pandas的时间序列图上绘制垂直线？: 你如何在pandas系列情节中绘制垂直线（vlines）？我正在使用pandas来绘制滚动手段等，并想用垂直线标记重要的位置。是否有可能使用vlines或类似的东西来完成这个？如果是的话，有人可以提供一个例子吗？在这种情况下，x轴是date时间。

如何将数据从mongodb导入pandas？: 我需要分析MongoDB中的大量数据。我如何将这些数据导入pandas？我是pandas和numpy的新手。编辑：mongodb集合包含标记date和时间的传感器值。传感器值是浮点数据types。样本数据： { "_cls" : "SensorReport", "_id" : ObjectId("515a963b78f6a035d9fa531b"), "_types" : [ "SensorReport" ], "Readings" : [ { "a" : 0.958069536790466, "_types" : [ "Reading" ], "ReadingUpdatedDate" : ISODate("2013-04-02T08:26:35.297Z"), "b" : 6.296118156595, "_cls" : "Reading" }, { "a" : 0.95574014778624, "_types" : [ "Reading" ], "ReadingUpdatedDate" : ISODate("2013-04-02T08:27:09.963Z"), "b" : 6.29651468650064, […]

pandas数据框获得每组的第一行: 我有一个像下面的pandasDataFrame 。 df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7], 'value' : ["first","second","second","first", "second","first","third","fourth", "fifth","second","fifth","first", "first","second","third","fourth","fifth"]}) 我想通过[“id”，“value”]将其分组，并得到每个组的第一行。 id value 0 1 first 1 1 second 2 1 second 3 2 first 4 2 second 5 3 first 6 3 third 7 3 fourth 8 3 fifth 9 4 second 10 4 fifth 11 5 first 12 6 first […]

重命名Pandas DataFrame索引: 我有一个没有标题的csv文件，有一个DateTime索引。我想重命名索引和列名称，但使用df.rename（）只重命名列名称。错误？我在版本0.12.0 In [2]: df = pd.read_csv(r'D:\Data\DataTimeSeries_csv//seriesSM.csv', header=None, parse_dates=[[0]], index_col=[0] ) In [3]: df.head() Out[3]: 1 0 2002-06-18 0.112000 2002-06-22 0.190333 2002-06-26 0.134000 2002-06-30 0.093000 2002-07-04 0.098667 In [4]: df.rename(index={0:'Date'}, columns={1:'SM'}, inplace=True) In [5]: df.head() Out[5]: SM 0 2002-06-18 0.112000 2002-06-22 0.190333 2002-06-26 0.134000 2002-06-30 0.093000 2002-07-04 0.098667

pandas数据框到词典列表: 我有以下的DataFrame：客户item1 item2 item3 1个苹果牛奶番茄 2水橙土豆 3汁芒果芯片我想把它翻译成每行字典列表 rows = [{'customer': 1, 'item1': 'apple', 'item2': 'milk', 'item3': 'tomato'}, {'customer': 2, 'item1': 'water', 'item2': 'orange', 'item3': 'potato'}, {'customer': 3, 'item1': 'juice', 'item2': 'mango', 'item3': 'chips'}]

pandasjoin问题：列重叠但没有指定后缀: 我有以下2个dataframe： df_a = mukey DI PI 0 100000 35 14 1 1000005 44 14 2 1000006 44 14 3 1000007 43 13 4 1000008 43 13 df_b = mukey niccdcd 0 190236 4 1 190237 6 2 190238 7 3 190239 4 4 190240 7 当我尝试join这两个数据框时： join_df = df_a.join(df_b,on='mukey',how='left') 我得到的错误： *** ValueError: columns overlap but […]