pandasDataFrame列表

我正在满足另一列中的条件，从一列中提取数据的子集。

我可以得到正确的值，但它是在pandas.core.frame.DataFrame。如何将其转换为列表？

import pandas as pd tst = pd.read_csv('C:\\SomeCSV.csv') lookupValue = tst['SomeCol'] == "SomeValue" ID = tst[lookupValue][['SomeCol']] #How To convert ID to a list

使用.values得到一个numpy.array ，然后.tolist()来得到一个列表。

例如：

 import pandas as pd df = pd.DataFrame({'a':[1,3,5,7,4,5,6,4,7,8,9], 'b':[3,5,6,2,4,6,7,8,7,8,9]})

结果：

 >>> df['a'].values.tolist() [1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9]

或者你可以使用

 >>> df['a'].tolist() [1, 3, 5, 7, 4, 5, 6, 4, 7, 8, 9]

要删除重复项，您可以执行以下任一操作：

 >>> df['a'].drop_duplicates().values.tolist() [1, 3, 5, 7, 4, 6, 8, 9] >>> list(set(df['a'])) # as pointed out by EdChum [1, 3, 4, 5, 6, 7, 8, 9]

我想澄清几件事情：

正如其他答案指出的，最简单的事情是使用pandas.Series.tolist() 。我不知道为什么顶级投票的答案pandas.Series.values.tolist()了使用pandas.Series.values.tolist()因为据我所知，它增加了语法/混淆，没有额外的好处。
tst[lookupValue][['SomeCol']]是一个数据tst[lookupValue][['SomeCol']] （如问题中所述），而不是一系列（如问题的评论中所述）。这是因为tst[lookupValue]是一个数据tst[lookupValue] ，并用[['SomeCol']]对它进行分片，要求列出列表（列表恰好长度为1），结果返回一个数据框。如果你删除了多余的括号，如tst[lookupValue]['SomeCol'] ，那么你只需要那一列而不是一列列表，这样你就得到了一系列的结果。
你需要一个系列来使用pandas.Series.tolist() ，所以你应该在这种情况下跳过第二组括号。仅供参考，如果您最终得到的是一个不容易避免的列数据pandas.DataFrame.squeeze() ，您可以使用pandas.DataFrame.squeeze()将其转换为一系列数据。
tst[lookupValue]['SomeCol']通过链接切片获取特定列的子集。它会切片一次以获得只有特定行的dataframe，然后再切片以获得某个列。你可以在这里摆脱它，因为你只是阅读，而不是写作，但正确的方法是tst.loc[lookupValue, 'SomeCol'] （它返回一系列）。
使用＃4的语法，你可以合理的做一行： ID = tst.loc[tst['SomeCol'] == 'SomeValue', 'SomeCol'].tolist()

演示代码：

 import pandas as pd df = pd.DataFrame({'colA':[1,2,1], 'colB':[4,5,6]}) filter_value = 1 print "df" print df print type(df) rows_to_keep = df['colA'] == filter_value print "\ndf['colA'] == filter_value" print rows_to_keep print type(rows_to_keep) result = df[rows_to_keep]['colB'] print "\ndf[rows_to_keep]['colB']" print result print type(result) result = df[rows_to_keep][['colB']] print "\ndf[rows_to_keep][['colB']]" print result print type(result) result = df[rows_to_keep][['colB']].squeeze() print "\ndf[rows_to_keep][['colB']].squeeze()" print result print type(result) result = df.loc[rows_to_keep, 'colB'] print "\ndf.loc[rows_to_keep, 'colB']" print result print type(result) result = df.loc[df['colA'] == filter_value, 'colB'] print "\ndf.loc[df['colA'] == filter_value, 'colB']" print result print type(result) ID = df.loc[rows_to_keep, 'colB'].tolist() print "\ndf.loc[rows_to_keep, 'colB'].tolist()" print ID print type(ID) ID = df.loc[df['colA'] == filter_value, 'colB'].tolist() print "\ndf.loc[df['colA'] == filter_value, 'colB'].tolist()" print ID print type(ID)

结果：

 df colA colB 0 1 4 1 2 5 2 1 6 <class 'pandas.core.frame.DataFrame'> df['colA'] == filter_value 0 True 1 False 2 True Name: colA, dtype: bool <class 'pandas.core.series.Series'> df[rows_to_keep]['colB'] 0 4 2 6 Name: colB, dtype: int64 <class 'pandas.core.series.Series'> df[rows_to_keep][['colB']] colB 0 4 2 6 <class 'pandas.core.frame.DataFrame'> df[rows_to_keep][['colB']].squeeze() 0 4 2 6 Name: colB, dtype: int64 <class 'pandas.core.series.Series'> df.loc[rows_to_keep, 'colB'] 0 4 2 6 Name: colB, dtype: int64 <class 'pandas.core.series.Series'> df.loc[df['colA'] == filter_value, 'colB'] 0 4 2 6 Name: colB, dtype: int64 <class 'pandas.core.series.Series'> df.loc[rows_to_keep, 'colB'].tolist() [4, 6] <type 'list'> df.loc[df['colA'] == filter_value, 'colB'].tolist() [4, 6] <type 'list'>

你可以使用pandas.Series.tolist

例如：

 import pandas as pd df = pd.DataFrame({'a':[1,2,3], 'b':[4,5,6]})

跑：

 >>> df['a'].tolist()

你会得到

 >>> [1, 2, 3]

如果所有的数据都是相同的dtype，上面的解决scheme是好的。 Numpy数组是均匀的容器。当你做df.values输出是一个numpy array 。所以如果数据有int和float ，那么输出将会有int或者float并且这些列将会丢失原来的dtype。考虑df

 ab 0 1 4 1 2 5 2 3 6 a float64 b int64

所以如果你想保留原来的dtype，你可以做类似的事情

 row_list = df.to_csv(None, header=False, index=False).split('\n')

这将以stringforms返回每一行。

 ['1.0,4', '2.0,5', '3.0,6', '']

然后拆分每一行以获得列表的列表。拆分后的每个元素都是一个unicode。我们需要将其转换为所需的数据types。

 def f(row_str): row_list = row_str.split(',') return [float(row_list[0]), int(row_list[1])] df_list_of_list = map(f, row_list[:-1]) [[1.0, 4], [2.0, 5], [3.0, 6]]

pandasDataFrame列表

如何过滤Pandas read_csv函数中的行？

如何将一列分成两列？

python从DataFrame制作热图

将pandas多指标转入栏目

在Pandas DataFrame Python中添加新列

如何迭代Pandas中的DataFrame中的行？

pandas中的多指标分类

pandas：如何摆脱数据框中的“未命名：”列

输出两个Pandas数据框的差异 – 突出显示差异

如何在pandas的两列中形成元组列