如何使用pandas/python处理excel文件头

2024-05-15 22:15:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我正试着用熊猫阅读https://www.whatdotheyknow.com/request/193811/response/480664/attach/3/GCSE%20IGCSE%20results%20v3.xlsx

我的剧本是

import sys
import pandas as pd
inputfile = sys.argv[1]
xl = pd.ExcelFile(inputfile)
#    print xl.sheet_names
df = xl.parse(xl.sheet_names[0])
print df.head()

然而,这似乎并没有像它给出的那样正确地处理头文件

  GCSE and IGCSE1 results2,3 in selected subjects4 of pupils at the end of key stage 4 Unnamed: 1 Unnamed: 2 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 6 Unnamed: 7 Unnamed: 8 Unnamed: 9 Unnamed: 10
0                              Year: 2010/11 (Final)                                          NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN         NaN
1                                  Coverage: England                                          NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN         NaN
2                                                NaN                                          NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN         NaN
3  1. Includes International GCSE, Cambridge Inte...                                          NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN         NaN
4  2. Includes attempts and achievements by these...                                          NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN        NaN         NaN

所有这些都应视为评论。

例如,如果将电子表格加载到libreoffice中,则可以看到列标题已正确解析,并显示在第15行中,并带有下拉菜单,以便选择所需的项。

如何让pandas像libreoffice一样自动检测列标题的位置?


Tags: andofimportpandasdfnamessysnan
1条回答
网友
1楼 · 发布于 2024-05-15 22:15:01

pandas是(是?)正确地处理文件,并且完全按照您的要求(他们?)到。您没有指定header值,这意味着它默认从第0行提取列名。前几行单元格在某些基本方面不是注释,它们只是不是您感兴趣的单元格。

只要告诉parse您想跳过一些行:

>>> xl = pd.ExcelFile("GCSE IGCSE results v3.xlsx")
>>> df = xl.parse(xl.sheet_names[0], skiprows=14)
>>> df.columns
Index([u'Local Authority Number', u'Local Authority Name', u'Local Authority Establishment Number', u'Unique Reference Number', u'School Name', u'Town', u'Number of pupils at the end of key stage 4', u'Number of pupils attempting a GCSE or an IGCSE', u'Number of students achieving 8 or more GCSE or IGCSE passes at A*-G', u'Number of students achieving 8 or more GCSE or IGCSE passes at A*-A', u'Number of students achieving 5 A*-A grades or more at GCSE or IGCSE'], dtype='object')
>>> df.head()
   Local Authority Number Local Authority Name  \
0                     201       City of london   
1                     201       City of london   
2                     202               Camden   
3                     202               Camden   
4                     202               Camden   

   Local Authority Establishment Number  Unique Reference Number  \
0                               2016005                   100001   
1                               2016007                   100003   
2                               2024104                   100049   
3                               2024166                   100050   
4                               2024196                   100051   

                       School Name    Town  \
0  City of London School for Girls  London   
1            City of London School  London   
2                Haverstock School  London   
3           Parliament Hill School  London   
4               Regent High School  London   

  Number of pupils at the end of key stage 4  \
0                                        105   
1                                        140   
2                                        200   
3                                        172   
4                                        174   

  Number of pupils attempting a GCSE or an IGCSE  \
0                                            104   
1                                            140   
2                                            194   
3                                            169   
4                                            171   

  Number of students achieving 8 or more GCSE or IGCSE passes at A*-G  \
0                                                100                    
1                                                108                    
2                                               SUPP                    
3                                                 22                    
4                                                  0                    

  Number of students achieving 8 or more GCSE or IGCSE passes at A*-A  \
0                                                 87                    
1                                                 75                    
2                                                  0                    
3                                                  7                    
4                                                  0                    

  Number of students achieving 5 A*-A grades or more at GCSE or IGCSE  
0                                                100                   
1                                                123                   
2                                                  0                   
3                                                 34                   
4                                               SUPP                    

[5 rows x 11 columns]

相关问题 更多 >