Python Href刮削

import requests from bs4 import BeautifulSoup webpage_response = requests.get('http://www.harness.org.au/racing/results/?activeTab=tab') webpage_response.content webpage_response = requests.get soup = BeautifulSoup(webpage, "html.parser") #only finding one track #soup.table to find all links for days racing harness_table = soup.table #scraps a href that is an incomplete URL that im trying to get to for link in soup.select(".meetingText > a"): link.insert(0, "http://www.harness.org.au") webpage = requests.get(link) new_soup = BeautifulSoup(webpage.content, "html.parser") #work through table to get links to tracks print(new_soup)'''

2条回答

网友

1楼 · 编辑于 2024-05-16 10:40:48

您可以将网站的基本url存储在一个变量中，然后一旦您从链接获得href，您就可以将这两个url连接起来以创建下一个url

import requests
from bs4 import BeautifulSoup

base_url = "http://www.harness.org.au"

webpage_response = requests.get('http://www.harness.org.au/racing/results/?activeTab=tab')

soup = BeautifulSoup(webpage_response.content, "html.parser")

# only finding one track
# soup.table to find all links for days racing
harness_table = soup.table
# scraps a href that is an incomplete URL that im trying to get to
for link in soup.select(".meetingText > a"):
    webpage = requests.get(base_url + link["href"])

    new_soup = BeautifulSoup(webpage.content, "html.parser")

    # work through table to get links to tracks
    print(new_soup)

网友

2楼 · 编辑于 2024-05-16 10:40:48

试试这个解决方案。也许你会喜欢这个图书馆

from simplified_scrapy import SimplifiedDoc,req
url = 'http://www.harness.org.au/racing/results/?activeTab=tab'

html = req.get(url)
doc = SimplifiedDoc(html)
links = [doc.absoluteUrl(url,ele.a['href']) for ele in doc.selects('td.meetingText')]
print(links)

结果:

['http://www.harness.org.au/racing/fields/race-fields/?mc=BA040320', 'http://www.harness.org.au/racing/fields/race-fields/?mc=BH040320', 'http://www.harness.org.au/racing/fields/race-fields/?mc=RE040320']

相关问题更多 >

编程相关推荐

热门问题

热门文章