Python Href刮削

2024-05-16 10:40:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试循环a href并获取URL。我已经设法将href添加到这个链接中,但我需要完整的url。这是我现在的密码

 import requests

 from bs4 import BeautifulSoup



 webpage_response = requests.get('http://www.harness.org.au/racing/results/?activeTab=tab')



 webpage_response.content

 webpage_response = requests.get


 soup = BeautifulSoup(webpage, "html.parser")


 #only finding one track
 #soup.table to find all links for days racing
 harness_table = soup.table
 #scraps a href that is an incomplete URL that im trying to get to
  for link in soup.select(".meetingText > a"):
     link.insert(0, "http://www.harness.org.au")

     webpage = requests.get(link)
     new_soup = BeautifulSoup(webpage.content, "html.parser")

    #work through table to get links to tracks
     print(new_soup)'''

Tags: toimporthttpurlgetresponsewwwtable
2条回答

您可以将网站的基本url存储在一个变量中,然后一旦您从链接获得href,您就可以将这两个url连接起来以创建下一个url

import requests
from bs4 import BeautifulSoup

base_url = "http://www.harness.org.au"

webpage_response = requests.get('http://www.harness.org.au/racing/results/?activeTab=tab')

soup = BeautifulSoup(webpage_response.content, "html.parser")

# only finding one track
# soup.table to find all links for days racing
harness_table = soup.table
# scraps a href that is an incomplete URL that im trying to get to
for link in soup.select(".meetingText > a"):
    webpage = requests.get(base_url + link["href"])

    new_soup = BeautifulSoup(webpage.content, "html.parser")

    # work through table to get links to tracks
    print(new_soup)

试试这个解决方案。也许你会喜欢这个图书馆

from simplified_scrapy import SimplifiedDoc,req
url = 'http://www.harness.org.au/racing/results/?activeTab=tab'

html = req.get(url)
doc = SimplifiedDoc(html)
links = [doc.absoluteUrl(url,ele.a['href']) for ele in doc.selects('td.meetingText')]
print(links)

结果:

['http://www.harness.org.au/racing/fields/race-fields/?mc=BA040320', 'http://www.harness.org.au/racing/fields/race-fields/?mc=BH040320', 'http://www.harness.org.au/racing/fields/race-fields/?mc=RE040320']

相关问题 更多 >