Python中的Pandas文件加载错误（例如Python中的Pandas分析错误）

2条回答

网友

1楼 · 编辑于 2024-06-11 04:20:17

正如另一个答案已经暗示的那样，您的csv格式不正确，行尾有一个comma。因此，这导致pandas将第一列视为索引列。在

要解决这个问题，可以将index_col=False参数传递给^{}函数。示例-

In [24]: s = io.StringIO("""cmte_id,cand_id,cand_nm,contbr_nm,contbr_city,contbr_st,contbr_zip,contbr_employer,contbr_occupation,contb_receipt_amt,contb_receipt_dt,receipt_desc,memo_cd,memo_text,form_tp,file_num,tran_id,election_tp
   ....: C00579458,"P60008059","Bush, Jeb","EASTON, AMY KELLY MRS.","KEY BISCAYNE","FL","331491716","HOMEMAKER","HOMEMAKER",2700,26-JUN-15,"","","","SA17A","1024106","SA17.114991","P2016",""")

In [25]: df = pd.read_csv(s)  #Issue

In [26]: df
Out[26]:
             cmte_id    cand_id                 cand_nm     contbr_nm  \
C00579458  P60008059  Bush, Jeb  EASTON, AMY KELLY MRS.  KEY BISCAYNE

          contbr_city  contbr_st contbr_zip contbr_employer  \
C00579458          FL  331491716  HOMEMAKER       HOMEMAKER

           contbr_occupation contb_receipt_amt  contb_receipt_dt  \
C00579458               2700         26-JUN-15               NaN

           receipt_desc  memo_cd memo_text  form_tp     file_num tran_id  \
C00579458           NaN      NaN     SA17A  1024106  SA17.114991   P2016

           election_tp
C00579458          NaN

In [29]: df = pd.read_csv(s,index_col=False)  #No issue

In [30]: df
Out[30]:
     cmte_id    cand_id    cand_nm               contbr_nm   contbr_city  \
0  C00579458  P60008059  Bush, Jeb  EASTON, AMY KELLY MRS.  KEY BISCAYNE

  contbr_st  contbr_zip contbr_employer contbr_occupation  contb_receipt_amt  \
0        FL   331491716       HOMEMAKER         HOMEMAKER               2700

  contb_receipt_dt  receipt_desc  memo_cd  memo_text form_tp  file_num  \
0        26-JUN-15           NaN      NaN        NaN   SA17A   1024106

       tran_id election_tp
0  SA17.114991       P2016

这在the documentations-

index_col : int or sequence or False, default None
Column to use as the row labels of the DataFrame. If a sequence is given, a MultiIndex is used. If you have a malformed file with delimiters at the end of each line, you might consider index_col=False to force pandas to not use the first column as the index (row names)

（重点是我的）

网友

2楼 · 编辑于 2024-06-11 04:20:17

原始数据中每行的末尾都有一个额外的逗号。在

C00458844,"P60006723","Rubio, Marco","HEFFERNAN, MICHAEL","APO","AE","090960009","INFORMATION REQUESTED PER BEST EFFORTS","INFORMATION REQUESTED PER BEST EFFORTS",210,27-JUN-15,"","","","SA17A","1015697","SA17.796904","P2016",

如果有两个逗号，每行将移动2列。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python中的Pandas文件加载错误（例如Python中的Pandas分析错误）

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >