在pandas 0.10.1上使用pandas.read_csv指定dtype float32

我试图用pandas read_csv方法读取一个简单的空格分隔的文件。不过，pandas似乎并不服从我的dtype论点。也许我不正确地指定它？

我将这个简单的testing用例简化为read_csv 。我实际上在我的“真实”场景中使用了converters参数，但为了简单起见，我将其删除了。

以下是我的ipython会话：

 >>> cat test.out ab 0.76398 0.81394 0.32136 0.91063 >>> import pandas >>> import numpy >>> x = pandas.read_csv('test.out', dtype={'a': numpy.float32}, delim_whitespace=True) >>> x ab 0 0.76398 0.81394 1 0.32136 0.91063 >>> xadtype dtype('float64')

我也尝试过使用numpy.int32或numpy.int64 。这些select导致一个例外：

 AttributeError: 'NoneType' object has no attribute 'dtype'

我假设AttributeError是因为pandas不会自动尝试转换/截断浮点值为一个整数？

我正在32位机器上运行一个32位版本的Python。

 >>> !uname -a Linux ubuntu 3.0.0-13-generic #22-Ubuntu SMP Wed Nov 2 13:25:36 UTC 2011 i686 i686 i386 GNU/Linux >>> import platform >>> platform.architecture() ('32bit', 'ELF') >>> pandas.__version__ '0.10.1'

0.10.1并不真正支持float32

请参阅http://pandas.pydata.org/pandas-docs/dev/whatsnew.html#dtype-specification

你可以这样在0.11这样做：

 # dont' use dtype converters explicity for the columns you care about # they will be converted to float64 if possible, or object if they cannot df = pd.read_csv('test.csv'.....) #### this is optional and related to the issue you posted #### # force anything that is not a numeric to nan # columns are the list of columns that you are interesetd in df[columns] = df[columns].convert_objects(convert_numeric=True) # astype df[columns] = df[columns].astype('float32') see http://pandas.pydata.org/pandas-docs/dev/basics.html#object-conversion Its not as efficient as doing it directly in read_csv (but that requires

我已经证实，与0.11-dev，这个工作（在32位和64位，结果是一样的）

 In [5]: x = pd.read_csv(StringIO.StringIO(data), dtype={'a': np.float32}, delim_whitespace=True) In [6]: x Out[6]: ab 0 0.76398 0.81394 1 0.32136 0.91063 In [7]: x.dtypes Out[7]: a float32 b float64 dtype: object In [8]: pd.__version__ Out[8]: '0.11.0.dev-385ff82' In [9]: quit() vagrant@precise32:~/pandas$ uname -a Linux precise32 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686 i686 i386 GNU/Linux some low-level changes)

 In [22]: df.a.dtype = pd.np.float32 In [23]: df.a.dtype Out[23]: dtype('float32')

上述工作对我来说很好，在pandas0.10.1下

在pandas 0.10.1上使用pandas.read_csv指定dtype float32

从Numpy数组创build一个Pandas DataFrame：如何指定索引列和列标题？