Pandas无法读取由数字生成的csv文件?

2024-06-07 09:20:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个以逗号分隔的csv文件,该文件由Mac数字导出,我试图将其读入数据帧,但收到一条错误消息:

df = pd.read_csv('game.csv', dtype={"rating": str}, error_bad_lines='ignore', encoding='utf8', sep=',')

错误消息是:

Traceback (most recent call last):
  File "/Users/congminmin/nlp/data_collection/crawler/data/game/test.py", line 5, in <module>
    df = pd.read_csv('game_app_apple.missing.url.csv', dtype={"rating": str}, error_bad_lines='ignore', encoding='utf8', sep=',')
  File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 676, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 448, in _read
    parser = TextFileReader(fp_or_buf, **kwds)
  File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 880, in __init__
    self._make_engine(self.engine)
  File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 1114, in _make_engine
    self._engine = CParserWrapper(self.f, **self.options)
  File "/Users/congminmin/.venv/data_collection/lib/python3.7/site-packages/pandas/io/parsers.py", line 1891, in __init__
    self._reader = parsers.TextReader(src, **kwds)
  File "pandas/_libs/parsers.pyx", line 426, in pandas._libs.parsers.TextReader.__cinit__
ValueError: invalid literal for int() with base 10: 'ignore'

我的csv无效吗?但它是由数字产生的。即使我删除了dtype参数,它也会遇到同样的问题。如果我删除了错误\u bad\u lines='ignore',我会得到以下错误:

File "pandas/_libs/parsers.pyx", line 860, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 875, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 929, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 916, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2071, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Expected 1 fields in line 3, saw 2

通过数字导出的csv是逗号分隔的,我想读入数据帧并以制表符分隔的形式输出,但遇到了上面的问题

添加数据:原始数据为中文,上述代码中的“评级”实际上为评分' 实际数据的翻译如下:

我必须截图,因为stackoverflow将其识别为垃圾邮件:

enter image description here


Tags: csvinpyselfpandasreaddataline

热门问题