如何从HackerRank获取此表的数据,并按原产国和分数进行过滤,然后将其导出为csv文件?

2024-04-23 07:58:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在学习Python上的web报废,我决定在HackerRank Leaderboard page中测试我的技能,因此我编写了下面的代码,希望在将国家限制添加到tester函数之前不会出现错误,以便成功导出我的csv文件

但是Python控制台回答说:

AttributeError: 'NoneType' object has no attribute 'find_all'

上面的错误对应于我的代码(for i in table.find_all({'class':'ellipsis'}):)中的第29行,所以我决定来这里寻求帮助,我担心可能会有更多的语法或逻辑错误,所以最好通过专家的反馈来消除我的疑虑

from bs4 import BeautifulSoup
import requests
import pandas as pd
import numpy as np
from time import sleep
from random import randint

pd.set_option('display.max_columns', None) 

#Declaring a variable for looping over all the pages
pages = np.arange(1, 93, 1)

a = pd.DataFrame()
#loop cycle

for url in pages:      

    #get html for each new page
    url ='https://www.hackerrank.com/leaderboard?page='+str(url)
    page = requests.get(url)
    sleep(randint(3,10))
    soup = BeautifulSoup(page.text, 'lxml')
    
    #get the table
    table = soup.find('header', {'class':'table-header flex'})
    headers = []
    
    #get the headers of the table and delete the "white space"
    for i in table.find_all({'class':'ellipsis'}):
        title = i.text.strip()
        headers.append(title)
    
    #set the headers to columns in a new dataframe 
    df = pd.DataFrame(columns=headers)
    
    rows = soup.find('div', {'class':'table-body'})
    #get the rows of the table but omit the first row (which are headers)
    for row in rows.find_all('table-row-wrapper')[1:]:
        data = row.find_all('table-row-column ellipsis')
        row_data = [td.text.strip() for td in data]  
        length = len(df)
        df.loc[length] = row_data 
    
    #set the data of the Txn Count column to float
    Txn = df['SCORE'].values
    
    
    #combine all the data rows in one single dataframe
    a = a.append(pd.DataFrame(df))  
    
    def tester(mejora):
        mejora = mejora[(mejora['SCORE']>2250.0)] 
        return mejora.to_csv('new_test_Score_Count.csv') 
    
    tester(a)

你们有什么想法或建议可以解决这个问题吗


Tags: theinimportdffordatagetpage
1条回答
网友
1楼 · 发布于 2024-04-23 07:58:24

错误表明,您的表元素为“无”。我在这里猜测,但是您无法从使用bs4的页面获取表,因为它是在使用javascript加载之后加载的。我建议用硒来代替

相关问题 更多 >