我正在用靓汤解析网页的具体内容,你能告诉我,我怎么能做到这一点? 代码:
import re
import pytz
import requests
import datetime
from flask import url_for
from bs4 import BeautifulSoup
from urllib.parse import urljoin
link = "http://www.espncricinfo.com/series/_/id/8038/season/2018/icc-world-cup-qualifiers/"
r = requests.get(link)
bigbash_article_html = r.text
soup = BeautifulSoup(bigbash_article_html, "html.parser")
details = soup.find("div",{"class":"module-list performers"})
bigbash_article_dict = {}
for div in details:
image_div = div.find("div", {"class": "img-container player"})
我不知道如何进一步进行,我期望输出如下
预期产量:
最佳得分手:
[{'playerimage':'http://a.espncdn.com/combiner/i?img=/i/headshots/cricket/players/default-player-logo-500.png&h=55&w=40&scale=crop&transparent=true','playername':'TP Ura','player-details':'PNG, Right-hand bat','runs':'188','innings':'2','Average':'94.00'},..............................................................................................}]
另一列也一样 最佳小门接受者:
[{'playerimage':'http://a.espncdn.com/combiner/i?img=/i/headshots/cricket/players/default-player-logo-500.png&h=55&w=40&scale=crop&transparent=true','playername':'Ehsan Khan','player-details':'HKG, Right-arm offbreak','wickets':'9','innings':'3','Average':'12.55'},..............................................................................................}]
首先,你找错标签了。所需内容位于
<ul class="module-list performers">
内,而不是具有相同类名的div
标记。你知道吗Top Run Scorers表位于
<div id="r-0">
标记内。每个播放器都位于li
标记中。您可以在li
标记中获得播放器的所有详细信息。你知道吗我将向您展示如何获得最佳得分手的图像、姓名和球员详细信息。你知道吗
输出:
选择元素中所有类名为
sub-module
和performers
的列表项,然后分析每个列表项的播放器详细信息。e、 g.相关问题 更多 >
编程相关推荐