如何在pandas中读取固定宽度格式的文本文件?

2024-05-15 06:15:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我刚刚得到了熊猫的资料,正在琢磨如何读取文件。该文件来自WRDS数据库,是一直追溯到20世纪60年代的SP500成分列表。我检查了文件,无论我如何使用read_csv导入它,我仍然无法正确显示数据

df = read_csv('sp500-sb.txt')

df

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1231 entries, 0 to 1230
Data columns: gvkeyx      from      thru     conm
                                        gvkey      co_conm
...(the column names)
dtypes: object(1)

上面的输出块是什么意思?一切都会有帮助的


Tags: 文件csv数据数据库df列表readsb
3条回答

韦斯给我发了一封电子邮件。干杯

This is a fixed-width-format file (not delimited by commas or tabs as usual). I realize that pandas does not have a fixed-width reader like R does, though one can be fashioned very easily. I'll see what I can do. In the meantime if you can export the data in another format (like csv--truly comma separated) you'll be able to read it with read_csv. I suspect with some unix magic you can transform a FWF file into a CSV file.

I recommend following the issue on github as your e-mail is about to disappear from my inbox :)

https://github.com/pydata/pandas/issues/920

best, Wes

你说的展示是什么意思?df['gvkey']没有给你gvkey列中的数据吗

如果您要做的是将整个数据帧打印到控制台,那么请查看df.to_string(),但是如果列太多,则很难读取。默认情况下,如果列太多,Pandas不会打印整个内容:

import pandas
import numpy 

df1 = pandas.DataFrame(numpy.random.randn(10, 3), columns=['col%d' % d for d in range(3)] )
df2 = pandas.DataFrame(numpy.random.randn(10, 30), columns=['col%d' % d for d in range(30)] )

print df1   # <--- substitute by df2 to see the difference
print
print df1['col1']
print
print df1.to_string()

pandas.read_fwf()是在pandas 0.7.3(April 2012)中添加的,用于处理固定宽度的文件

  1. API reference

  2. An example from other question

相关问题 更多 >

    热门问题