如何从一个string创buildpandas数据框

为了testing一些function，我想从一个string创build一个DataFrame 。假设我的testing数据如下所示：

 TESTDATA="""col1;col2;col3 1;4.4;99 2;4.5;200 3;4.7;65 4;3.2;140 """

将数据读入Pandas DataFrame最简单的方法是什么？

简单的方法是使用StringIO并将其传递给pandas.read_csv函数。例如：

 import sys if sys.version_info[0] < 3: from StringIO import StringIO else: from io import StringIO import pandas as pd TESTDATA=StringIO("""col1;col2;col3 1;4.4;99 2;4.5;200 3;4.7;65 4;3.2;140 """) df = pd.read_csv(TESTDATA, sep=";")

一个CSV看起来很难将数据存储为一个stringvariables。考虑pipe道分隔的数据。各种IDE和编辑器可能有一个插件来将pipe道分隔的文本格式化为一个整洁的表格。

以下为我工作。要使用它，请将其存储到名为pandas_util.py的文件中，并调用read_pipe_separated_str(str_input) 。函数的文档string中包含一个示例。

 import io import re import pandas as pd def _prepare_pipe_separated_str(str_input): substitutions = [ ('^ *', ''), # Remove leading spaces (' *$', ''), # Remove trailing spaces (r' *\| *', '|'), # Remove spaces between columns ] if all(line.lstrip().startswith('|') and line.rstrip().endswith('|') for line in str_input.strip().split('\n')): substitutions.extend([ (r'^\|', ''), # Remove redundant leading delimiter (r'\|$', ''), # Remove redundant trailing delimiter ]) for pattern, replacement in substitutions: str_input = re.sub(pattern, replacement, str_input, flags=re.MULTILINE) return str_input def read_pipe_separated_str(str_input): """Read a Pandas object from a pipe-separated table contained within a string. Example: | int_score | ext_score | automation_eligible | | | | True | | 221.3 | 0 | False | | | 576 | True | | 300 | 600 | True | The leading and trailing pipes are optional, but if one is present, so must be the other. In PyCharm, the "Pipe Table Formatter" plugin has a "Format" feature that can be used to neatly format a table. """ str_input = _prepare_pipe_separated_str(str_input) return pd.read_csv(io.StringIO(str_input), sep='|')

如何从一个string创buildpandas数据框

将一个NumPy数组转换成一个csv文件

TypeError：需要类似字节的对象，而不是python和CSV中的“str”

修剪一个巨大的（3.5 GB）CSV文件读入R

Ruby CSV – 获取当前行/行号

Powershell：引用包含空格的属性

在Excel 2007中使用换行符导入CSV

如何基于涉及字段的条件提取dataframe的子集？

CSV / Excel的最佳时间戳格式？

如何将逗号分隔的值拆分为列

在SQL Server中批量插入正确引用的CSV文件