如何在http请求python中获得所有结果

2024-05-16 02:12:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从https://www.ncl.com/获取所有结果。我发现请求必须是GET,并发送到这个链接:https://www.ncl.com/search_vacations 到目前为止,我得到了前12个结果,解析它们没有问题。问题是我找不到方法来“更改”结果页面。我拿到了499张中的12张,我需要把它们都弄齐。我尝试过这样做https://www.ncl.com/search_vacations?current_page=1,每次都递增,但每次都得到相同的(第一个)结果。再次尝试向请求json = {"current_page": '1'}添加json主体,但没有成功。 这是我目前为止的代码:

    import math
import requests

session = requests.session()
proxies = {'https': 'https://97.77.104.22:3128'}
headers = {
    "authority": "www.ncl.com",
    "method": "GET",
    "path": "/search_vacations",
    "scheme": "https",
    "accept": "application/json, text/plain, */*",
    "connection": "keep-alive",
    "referer": "https://www.ncl.com",
    "cookie": "AkaUTrackingID=5D33489F106C004C18DFF0A6C79B44FD; AkaSTrackingID=F942E1903C8B5868628CF829225B6C0F; UrCapture=1d20f804-718a-e8ee-b1d8-d4f01150843f; BIGipServerpreprod2_www2.ncl.com_http=61515968.20480.0000; _gat_tealium_0=1; BIGipServerpreprod2_www.ncl.com_r4=1957341376.10275.0000; MP_COUNTRY=us; MP_LANG=en; mp__utma=35125182.281213660.1481488771.1481488771.1481488771.1; mp__utmc=35125182; mp__utmz=35125182.1481488771.1.1.utmccn=(direct)|utmcsr=(direct)|utmcmd=(none); utag_main=_st:1481490575797$ses_id:1481489633989%3Bexp-session; s_pers=%20s_fid%3D37513E254394AD66-1292924EC7FC34CB%7C1544560775848%3B%20s_nr%3D1481488775855-New%7C1484080775855%3B; s_sess=%20s_cc%3Dtrue%3B%20c%3DundefinedDirect%2520LoadDirect%2520Load%3B%20s_sq%3D%3B; _ga=GA1.2.969979116.1481488770; mp__utmb=35125182; NCL_LOCALE=en-US; SESS93afff5e686ba2a15ce72484c3a65b42=5ecffd6d110c231744267ee50e4eeb79; ak_location=US,NY,NEWYORK,501; Ncl_region=NY; optimizelyEndUserId=oeu1481488768465r0.23231006365903206",
    "Proxy-Authorization": "Basic QFRLLTVmZjIwN2YzLTlmOGUtNDk0MS05MjY2LTkxMjdiMTZlZTI5ZDpAVEstNWZmMjA3ZjMtOWY4ZS00OTQxLTkyNjYtOTEyN2IxNmVlMjlk"
}


def get_count():
    response = requests.get(
        "https://www.ncl.com/search_vacations?cruise=1&cruiseTour=0&cruiseHotel=0&cruiseHotelAir=0&flyCruise=0&numberOfGuests=4294953449&state=undefined&pageSize=10&currentPage=",
        proxies=proxies)
    tmpcruise_results = response.json()
    tmpline = tmpcruise_results['meta']
    total_record_count = tmpline['aggregate_record_count']
    return total_record_count


total_cruise_count = get_count()
total_page_count = math.ceil(int(total_cruise_count) / 10)
session.headers.update(headers)
cruises = []
page_counter = 1
while page_counter <= total_page_count:
    url = "https://www.ncl.com/search_vacations?current_page=" + str(page_counter) + ""
    page = requests.get(url, headers=headers, proxies=proxies)
    cruise_results = page.json()
    for line in cruise_results['results']:
        cruises.append(line)
        print(line)
    page_counter += 1
    print(cruise_results['pagination']["current_page"])
    print("----------")
print(len(cruises))

使用requests和代理。有什么办法吗?在


Tags: httpscomjsonsearchwwwcountpagerequests
1条回答
网友
1楼 · 发布于 2024-05-16 02:12:51

该网站声称有12264个搜索结果(空白搜索),共12页。在

搜索url接受一个参数Nao,它似乎定义了搜索结果的偏移量,结果页将从该偏移量开始。在

所以抓取https://www.ncl.com/uk/en/search_vacations?Nao=45

应该得到12个搜索结果的“页面”,从结果46开始。在

果然:

"pagination": {

    "starting_record": "46",
    "ending_record": "57",
    "current_page": "4",
    "start_page": "1",
    ...

因此,要翻页查看所有结果,请从Nao=0开始,并为每个提取添加12。在

相关问题 更多 >