Python Pandas所有文本列均显示为NaN
我有一系列用制表符分隔的文件,想在一个Python脚本中读取它们。但是不知道为什么,当我导入文件时,所有的文本列都变成了NaN(缺失值)。
这是一个输入文件的示例:
Blah Blah
Blah Blah
Blah Blah
Blah Blah
Blah Blah
Blah Blah
Blah Blah
Period: Oct 28 2013 - Apr 27 2014
Note:
Brand Variant Industry Major Category Market Media Type Parent Company Product Category Report Period (multiple) PCC Sub Group Subsidiary Units $$$ (000)
3 LADIES HAND-DIPPED CANDIES CANDY CONFECT., SNACKS & SOFT DRINKS CONFECTIONERY & SNACKS Columbus Combo Local Newspaper COTTAGE FOOD PRODUCTION OPERATION CANDY 11/18/13 - 11/24/13 F211 CANDY & GUM COTTAGE FOOD PRODUCTION OPERATION 1 0.286
3 MUSKETEERS CANDY BAR CONFECT., SNACKS & SOFT DRINKS CONFECTIONERY & SNACKS Atlanta Combo Spot Radio MARS INC CANDY BAR 11/04/13 - 11/10/13 F211 CANDY & GUM MARS SNACKFOOD US LLC 22 1.403
这是我Python(3.3)代码的一部分:
df = read_csv(csvFile, delimiter='\t', header=[9])
print(df)
输出结果如下:
Brand Variant \
3 LADIES HAND-DIPPED CANDIES CANDY NaN
3 MUSKETEERS CANDY BAR NaN
Industry \
3 LADIES HAND-DIPPED CANDIES CANDY NaN
3 MUSKETEERS CANDY BAR NaN
Major Category \
3 LADIES HAND-DIPPED CANDIES CANDY NaN
3 MUSKETEERS CANDY BAR NaN
Market \
3 LADIES HAND-DIPPED CANDIES CANDY NaN
3 MUSKETEERS CANDY BAR NaN
Media Type \
3 LADIES HAND-DIPPED CANDIES CANDY NaN
3 MUSKETEERS CANDY BAR NaN
Parent Company \
3 LADIES HAND-DIPPED CANDIES CANDY NaN
3 MUSKETEERS CANDY BAR NaN
Product Category \
3 LADIES HAND-DIPPED CANDIES CANDY NaN
3 MUSKETEERS CANDY BAR NaN
Report Period (multiple) \
3 LADIES HAND-DIPPED CANDIES CANDY NaN
3 MUSKETEERS CANDY BAR NaN
PCC Sub Group \
3 LADIES HAND-DIPPED CANDIES CANDY NaN
3 MUSKETEERS CANDY BAR NaN
Subsidiary \
3 LADIES HAND-DIPPED CANDIES CANDY NaN
3 MUSKETEERS CANDY BAR NaN
Units $$$ (000)
3 LADIES HAND-DIPPED CANDIES CANDY NaN NaN
3 MUSKETEERS CANDY BAR NaN NaN
我注意到我的第一列似乎被设置为数据框的索引,但是如果我设置index_col = False,程序会报ValueError错误,因为它需要一个列号。我也尝试过把数据类型设置为字符串,但还是没成功。最后,在另一个用逗号分隔的文件中,我能够成功读取到包含文本数据的行。现在我不知道该怎么办了……
我注意到字段之间的分隔更像是制表符和空格的组合。
1 个回答
1
如果你想忽略“Blah Blah”文件的前几行,可以用 skiprows=
这个参数,而不是 header=
。试试这个:
df = pd.read_csv(csvFile, sep='\t', skiprows=9, index_col=False)
我猜
“第一列似乎被设置为数据框的索引”
的原因是,你的文件里可能有多余的分隔符。如果是这样的话,使用 index_col=False
可能会有帮助。你可以查看这个链接了解更多信息:处理 read_csv 中的多余分隔符
因为我没有你的输入文件,而且你复制粘贴的文本似乎把制表符弄坏了(所有的空格都变成了文本),所以我没法测试。不过请告诉我们结果。