大Pandas阅读

2024-04-23 23:05:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我读过thisthisthis的文章,但尽管我不知道quotechar为什么在pd.read_csv()(Python 3,pandas 0.18.0和0.18.1)上不起作用。我怎么能读到这样的数据帧:

"column1","column2", "column3", "column4", "column5", "column6"
"AM", 7, "1", "SD", "SD", "CR"
"AM", 8, "1,2 ,3", "PR, SD,SD", "PR ; , SD,SD", "PR , ,, SD ,SD"
"AM", 1, "2", "SD", "SD", "SD"

我想得到以下结果:

Out[116]: 
  column1  column2 column3    column4       column5        column6
0      AM        7       1         SD            SD             CR
1      AM        8  1,2 ,3  PR, SD,SD  PR ; , SD,SD  PR , ,, SD,SD
2      AM        1       2         SD            SD             SD

谢谢!!


Tags: read文章prsdthisamcrpd
2条回答

read_csv()中分隔符上的Pandas doc

Separators longer than 1 character and different from '\s+' will be interpreted as regular expressions, will force use of the python parsing engine and will ignore quotes in the data.

尝试改用这个(sep默认设置为逗号):

pd.read_csv(file, skipinitialspace = True, quotechar = '"')

另一种解决方案是使用正确的正则表达式,而不是简单的\s+。我们需要找到不在引号内的逗号(,):

pd.read_csv(file, 
            sep=', (?=(?:"[^"]*?(?: [^"]*)*))|, (?=[^",]+(?:,|$))',
            engine='python')

表达式取自here

相关问题 更多 >