阅读MSR将语料库翻译成Pandas

2024-04-19 19:54:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我从MSR下载了MSR释义语料库,并尝试将其加载到dataframe中,但出现以下错误:

import pandas as pd
df = pd.read_csv(r'C:\MSRParaphraseCorpus\msr_paraphrase_test.txt', sep = '\t' )

错误:

^{pr2}$

我看了第34行,它看起来很好。在

  fname = r'C:\MSRParaphraseCorpus\msr_paraphrase_test.txt'
    with open(fname, encoding="utf8") as f:
        content = f.readlines()

    content[34]

输出:

'0\t1268500\t1268733\tAgainst the Japanese currency, the euro was at 135.92/6.04 yen against the late New York level of 136.03/14.\tThe dollar was at 117.85 yen against the Japanese currency, up 0.1 percent.\n'

Tags: thetesttxtas错误contentfnamecurrency
1条回答
网友
1楼 · 发布于 2024-04-19 19:54:31

问题在于第34行的开放式引号(正如我在评论中提到的)。通过传入禁用csv读取器的引用csv.QUOTE_无. 尝试:

import csv
import pandas as pd

df = pd.read_csv(r'C:\MSRParaphraseCorpus\msr_paraphrase_test.txt', sep = '\t', quoting=csv.QUOTE_NONE)

相关问题 更多 >