Python Pandas将所有文本列显示为NaN

2024-05-15 05:39:18 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一系列由制表符分隔的文件,我希望在python脚本中读取这些文件。由于某些原因,当我导入文件时,我所有的文本列都返回为NaN。在

输入文件示例:

Blah Blah
Blah Blah
Blah Blah
Blah Blah
Blah Blah
Blah Blah
Blah Blah
Period: Oct 28 2013 - Apr 27 2014
Note:
Brand Variant                               Industry                                    Major Category                              Market                                      Media Type                                  Parent Company                              Product Category                            Report Period (multiple)                    PCC Sub Group                               Subsidiary                                  Units   $$$ (000)
3 LADIES HAND-DIPPED CANDIES CANDY  CONFECT., SNACKS & SOFT DRINKS  CONFECTIONERY & SNACKS  Columbus Combo  Local Newspaper     COTTAGE FOOD PRODUCTION OPERATION   CANDY   11/18/13 - 11/24/13     F211 CANDY & GUM    COTTAGE FOOD PRODUCTION OPERATION   1   0.286   
3 MUSKETEERS CANDY BAR  CONFECT., SNACKS & SOFT DRINKS  CONFECTIONERY & SNACKS  Atlanta Combo   Spot Radio  MARS INC    CANDY BAR   11/04/13 - 11/10/13     F211 CANDY & GUM    MARS SNACKFOOD US LLC   22  1.403   

下面是我的python(3.3)的一个片段:

^{pr2}$

输出如下:

Brand Variant                             \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Industry                                  \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Major Category                            \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Market                                    \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Media Type                                \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Parent Company                            \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Product Category                          \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Report Period (multiple)                  \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    PCC Sub Group                             \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Subsidiary                                \
3 LADIES HAND-DIPPED CANDIES CANDY                                       NaN   
3 MUSKETEERS CANDY BAR                                                   NaN   

                                    Units $$$ (000)  
3 LADIES HAND-DIPPED CANDIES CANDY    NaN       NaN  
3 MUSKETEERS CANDY BAR                NaN       NaN  

我注意到我的第一列似乎被设置为dataframe的索引,但是index_col=False只会产生一个ValueError,因为它需要一个列号。同样,我也尝试过将dtype设置为str,但没有成功。最后,在另一个用逗号分隔的文件上,我可以取回包含文本数据的行。我不知道该怎么办。。。在

我注意到的一点是字段之间更像制表符和空格。在


Tags: 文件文本barnan制表符periodblahhand
1条回答
网友
1楼 · 发布于 2024-05-15 05:39:18

如果您想忽略“Blah-Blah”的前几行,请使用skiprows=而不是{}。试试这个:

df = pd.read_csv(csvFile, sep='\t', skiprows=9, index_col=False)

为什么

"first column seems to be set as the index for the dataframe"

我猜,你的文件后面有分隔符。{cd3>如果这是帮助的话。见Handling of trailing delimiters in read_csv

因为我没有你的输入文件和你的复制粘贴文本显然破坏了制表符(文本中的所有空格),我不能测试它。但请告诉我们。

相关问题 更多 >

    热门问题