Python在没有[href]的多层网站上浏览网页

resp = urllib.request.urlopen("https://www.uniplaces.com/accommodation/lisbon") soup = BeautifulSoup(resp, from_encoding=resp.info().get_param('charset')) for link in soup.find_all('a', href=True): print(link['href'])

1条回答

网友

1楼 · 发布于 2024-05-29 03:07:50

如果您查看network选项卡，您会发现一些专门针对这个url的API调用：https://www.uniplaces.com/api/search/offers?city=PT-lisbon&limit=24&locale=en_GB&ne=38.79507211908374%2C-9.046124472314432&page=1&sw=38.68769060641113%2C-9.327992453271463

它指定了位置PT lisbon以及北（ne）和西南（sw）方向。从这个文件中，你可以得到每个优惠的id，并将其附加到当前的url中，你还可以从网页上获得所有信息（价格、说明等…）

例如：

import requests

resp = requests.get(
    url = 'https://www.uniplaces.com/api/search/offers', 
    params = {
        "city":'PT-lisbon',
        "limit":'24',
        "locale":'en_GB',
        "ne":'38.79507211908374%2C-9.046124472314432',
        "page":'1',
        "sw":'38.68769060641113%2C-9.327992453271463'
    })
body = resp.json()

base_url = 'https://www.uniplaces.com/accommodation/lisbon'

data = [
    (
        t['id'],                  #offer id
        base_url + '/' + t['id'], #this is the offer page
        t['attributes']['accommodation_offer']['title'], 
        t['attributes']['accommodation_offer']['price']['amount'],
        t['attributes']['accommodation_offer']['available_from']
    )
    for t in body['data']
]

print(data)

相关问题更多 >

编程相关推荐

热门问题

热门文章

Python在没有[href]的多层网站上浏览网页

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >