解析网页爬虫数据为Excel

2 投票
2 回答
35 浏览
提问于 2025-04-14 17:24

我刚接触这个,所以可能说的有些傻,抱歉啦。:D 我做了一些研究,现在能从我想抓取的网页上获取数据了。但是,我就是无法把数据整理成我想要的样子。

首先,这个网址是(每次活动都不一样,但这是一个示例活动):https://results.advancedeventsystems.com/event/PTAwMDAwMjkwMjQ90/divisions/131313/standings

到目前为止,我写的代码能把包含我数据的表格提取出来(不算表头,不过我现在不太担心表头的问题):

我希望你们能给我一些建议。

凤凰

import chromedriver_autoinstaller
from selenium import webdriver
from bs4 import BeautifulSoup

chromedriver_autoinstaller.install()

driver = webdriver.Chrome()

driver.get('https://results.advancedeventsystems.com/event/PTAwMDAwMjkwMjQ90/divisions/131313/standings')

html = driver.page_source

soup = BeautifulSoup(html, 'html.parser')

teams = soup.find_all('tbody', 'k-table-tbody')

print(teams)

这段代码让我获取到了整个网页的内容。但是现在,我想让数据的显示方式和HTML渲染的效果相似(比如,这里展示的样子)……我一直没有成功。

这是我想要的输出效果示例:

我想要的输出效果示例

2 个回答

0

这些数据是通过JavaScript从其他网址加载过来的。下面是一个例子,教你如何把这些数据放到Panda的DataFrame里:

import pandas as pd
import requests

api_url = "https://results.advancedeventsystems.com/odata/PTAwMDAwMjkwMjQ90/standings(dId=131313,cId=null,tIds=[])"

params = {"$orderby": "OverallRank,FinishRank,TeamName,TeamCode"}

data = requests.get(api_url, params=params).json()
# print(data)

df = pd.DataFrame(data["value"])

df = pd.concat([df, df.pop("Club").apply(pd.Series).add_prefix("Club_")], axis=1)
df = pd.concat(
    [df, df.pop("Division").apply(pd.Series).add_prefix("Division_")], axis=1
)
df = pd.concat(
    [df, df.pop("BidIdentification").apply(pd.Series).add_prefix("BidIdentification_")],
    axis=1,
)

print(df)

输出结果是:

    TeamId                   TeamName      TeamCode                        TeamText  MatchesWon  MatchesLost  MatchPercent  SetsWon  SetsLost  SetPercent  PointRatio  FinishRank  OverallRank FinishRankText         SearchableTeamName  Club_ClubId                        Club_Name  Division_DivisionId Division_Name  Division_TeamCount Division_CodeAlias Division_ColorHex BidIdentification_BidStatus BidIdentification_DivisionAlias  BidIdentification_DivisionId
0   171661     SA LADY GRIZZLIES 12-1   g12salgr1ls     SA LADY GRIZZLIES 12-1 (LS)           6            0      1.000000       12         0    1.000000    2.097902           1            1            1st     sa lady grizzlies 12-1        27673       SAN ANTONIO LADY GRIZZLIES               131313      12 Girls                  16           12 Girls           #5FBFFF                        None                            None                             0
1   165364      CTX Juniors 12 Mizuno   g12ctxjr1ls      CTX Juniors 12 Mizuno (LS)           5            1      0.833333       10         5    0.666667    1.183521           2            2            2nd      ctx juniors 12 mizuno        28511                      CTX Juniors               131313      12 Girls                  16           12 Girls           #5FBFFF                        None                            None                             0
2      425              AJV 12 adidas   g12ajvba1ls              AJV 12 adidas (LS)           4            1      0.800000        9         2    0.818182    1.690789           3            3            3rd              ajv 12 adidas          207         Austin Junior Volleyball               131313      12 Girls                  16           12 Girls           #5FBFFF                        None                            None                             0
3    17524               IMPACT - 121   g12impac1ls               IMPACT - 121 (LS)           4            1      0.800000        8         3    0.727273    1.191489           3            3            3rd               impact - 121          344           Impact Volleyball Club               131313      12 Girls                  16           12 Girls           #5FBFFF                        None                            None                             0
4    28820               AP 11 adidas   g11aperf1ls               AP 11 adidas (LS)           3            2      0.600000        7         4    0.636364    1.295337           5            5            5th               ap 11 adidas          469    Austin Performance Volleyball               131313      12 Girls                  16           12 Girls           #5FBFFF                        None                            None                             0
5    26234         WACO VBC 12 UA Red   g12wacov2ls         WACO VBC 12 UA Red (LS)           3            2      0.600000        6         6    0.500000    0.913934           5            5            5th         waco vbc 12 ua red          248             Waco Volleyball Club               131313      12 Girls                  16           12 Girls           #5FBFFF                        None                            None                             0
6   167561         Premier 12 Crimson   g12premr2nt         Premier 12 Crimson (NT)           2            3      0.400000        6         6    0.500000    1.110092           7            7            7th         premier 12 crimson           96                   Dallas Premier               131313      12 Girls                  16           12 Girls           #5FBFFF                        None                            None                             0
7      806            Roots 121 Green   g12roots1ls            Roots 121 Green (LS)           2            3      0.400000        4         6    0.400000    0.871287           7            7            7th            roots 121 green           99                 Roots Volleyball               131313      12 Girls                  16           12 Girls           #5FBFFF                        None                            None                             0
8   152120  AJV 12FutureWilcoAlliance  g12ajvba10ls  AJV 12FutureWilcoAlliance (LS)           3            2      0.600000        6         4    0.600000    0.952153           9            9            9th  ajv 12futurewilcoalliance          207         Austin Junior Volleyball               131313      12 Girls                  16           12 Girls           #5FBFFF                        None                            None                             0
9   167217           Angelo United 12   g12angun1ls           Angelo United 12 (LS)           2            3      0.400000        4         6    0.400000    0.963134          10           10           10th           angelo united 12        28568                    Angelo United               131313      12 Girls                  16           12 Girls           #5FBFFF                        None                            None                             0
10   26005             Roots 12 Maple   g12roots3ls             Roots 12 Maple (LS)           2            3      0.400000        4         7    0.363636    0.776860          11           11           11th             roots 12 maple           99                 Roots Volleyball               131313      12 Girls                  16           12 Girls           #5FBFFF                        None                            None                             0
11     426                AJV 12 Navy   g12ajvba5ls                AJV 12 Navy (LS)           1            4      0.200000        4         9    0.307692    0.771739          12           12           12th                ajv 12 navy          207         Austin Junior Volleyball               131313      12 Girls                  16           12 Girls           #5FBFFF                        None                            None                             0
12  126932  Austin Velocity 12s Green   g12avvbc4ls  Austin Velocity 12s Green (LS)           2            3      0.400000        5         6    0.454545    0.848101          13           13           13th  austin velocity 12s green         6974  Austin Velocity Volleyball Club               131313      12 Girls                  16           12 Girls           #5FBFFF                        None                            None                             0
13     428                 AJV 12 Red   g12ajvba7ls                 AJV 12 Red (LS)           1            4      0.200000        2         8    0.200000    0.752252          14           14           14th                 ajv 12 red          207         Austin Junior Volleyball               131313      12 Girls                  16           12 Girls           #5FBFFF                        None                            None                             0
14   17215          AJV 12 Cedar Park   g12ajvba4ls          AJV 12 Cedar Park (LS)           1            4      0.200000        3         8    0.272727    0.872340          15           15           15th          ajv 12 cedar park          207         Austin Junior Volleyball               131313      12 Girls                  16           12 Girls           #5FBFFF                        None                            None                             0
15  124578                AJV 12 Toro   g12ajvba6ls                AJV 12 Toro (LS)           0            5      0.000000        0        10    0.000000    0.500000          15           15           15th                ajv 12 toro          207         Austin Junior Volleyball               131313      12 Girls                  16           12 Girls           #5FBFFF                        None                            None                             0
0

首先,soup.find_all('tbody', 'k-table-tbody') 这个代码只会找到表格的主体部分。你可以右键点击页面,然后选择检查,查看页面的源代码。我简单看了一下,发现<div role="grid" class="k-grid-aria-root" id="k-8e3ece95-0943-4c84-bba8-4e6a808da4bf" aria-label="Data table" aria-rowcount="18" aria-colcount="10"> 是这个表格的最上层元素。

其次,可以试试 print(results.prettify()) 来让输出的内容更整齐易读。

如果你想把数据提取到某种数据结构里,就需要对这些元素进行循环处理。

这里有一个很好的入门教程:https://realpython.com/beautiful-soup-web-scraper-python/

撰写回答