ValueError:通过块导入数据pandas.csv_阅读器()

2024-03-29 14:09:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个大的gzip文件,我想将其导入到pandas数据框中。不幸的是,该文件的列数不均匀。数据格式大致如下:

.... Col_20: 25    Col_21: 23432    Col22: 639142
.... Col_20: 25    Col_22: 25134    Col23: 243344
.... Col_21: 75    Col_23: 79876    Col25: 634534    Col22: 5    Col24: 73453
.... Col_20: 25    Col_21: 32425    Col23: 989423
.... Col_20: 25    Col_21: 23424    Col22: 342421    Col23: 7    Col24: 13424    Col 25: 67
.... Col_20: 95    Col_21: 32121    Col25: 111231

作为一个测试,我试了一下:

^{pr2}$

我得到的回报是:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/nfs/sw/python/python-3.5.1/lib/python3.5/site-packages/pandas/io/parsers.py", line 795, in __next__
    return self.get_chunk()
  File "/nfs/sw/python/python-3.5.1/lib/python3.5/site-packages/pandas/io/parsers.py", line 836, in get_chunk
    return self.read(nrows=size)
  File "/nfs/sw/python/python-3.5.1/lib/python3.5/site-packages/pandas/io/parsers.py", line 815, in read
    ret = self._engine.read(nrows)
  File "/nfs/sw/python/python-3.5.1/lib/python3.5/site-packages/pandas/io/parsers.py", line 1761, in read
    alldata = self._rows_to_cols(content)
  File "/nfs/sw/python/python-3.5.1/lib/python3.5/site-packages/pandas/io/parsers.py", line 2166, in _rows_to_cols
    raise ValueError(msg)
ValueError: Expected 18 fields in line 28, saw 22

如何为分配一定数量的列熊猫.read_csv()? 在


Tags: inpyioselfpandasreadlibpackages
1条回答
网友
1楼 · 发布于 2024-03-29 14:09:19

你也可以试试这个:

for chunk in pd.read_csv(filename, sep='\t', chunksize=10**5, engine='python', error_bad_lines=False):
print(chunk)

error_bad_lines将跳过错误的行。我看看能不能找到更好的选择

编辑:为了维护被error_bad_lines跳过的行,我们可以遍历错误并将其添加回dataframe

^{pr2}$

相关问题 更多 >