Python，scrapy：无法使用firebug提供的链接对网站上动态生成的链接进行分页

GET directory?p=2&category=1&map[disable]=0&map[height]=500&map[list_height]=500&map[span]=5&map[style]=&map[list_show]=0&map[listing_default_zoom]=15&map[options][scrollwheel]=0&map[options][marker_clusters]=1&map[options][force_fit_bounds]=0&distance=0&is_mile=0&zoom=15&perpage=16&scroll_list=0&feature=1&featured_only=0&hide_searchbox=0&hide_nav=0&hide_nav_views=0&hide_pager=0&template=&grid_columns=4&sort=title

http://intheloop.com.sg/sabai/directory?p=2&category=1&map[disable]=0&map[height]=500&map[list_height]=500&map[span]=5&map[style]=&map[list_show]=0&map[listing_default_zoom]=15&map[options][scrollwheel]=0&map[options][marker_clusters]=1&map[options][force_fit_bounds]=0&distance=0&is_mile=0&zoom=15&perpage=16&scroll_list=0&feature=1&featured_only=0&hide_searchbox=0&hide_nav=0&hide_nav_views=0&hide_pager=0&template=&grid_columns=4&sort=title

1条回答

网友

1楼 · 发布于 2024-04-16 10:27:55

为了实现这一点，您需要对ajax url稍作修改

查看scrapy shell的演示：

In [1]: import re

In [2]: next_page_data = response.xpath('//div[@class="sabai-pull-right sabai-pagination"]/ul/li/a[contains(text(), "Next")]/@onclick').extract()

In [3]: next_page_data
Out[3]: [u"SABAI.ajax({scrollTo:'#sabai-directory-listings',trigger:jQuery(this), target:'#sabai-directory-listings', url:'http://intheloop.com.sg/sabai/directory?p=2&category=1&map%5Bdisable%5D=0&map%5Bheight%5D=500&map%5Blist_height%5D=500&map%5Bspan%5D=5&map%5Bstyle%5D=&map%5Blist_show%5D=0&map%5Blisting_default_zoom%5D=15&map%5Boptions%5D%5Bscrollwheel%5D=0&map%5Boptions%5D%5Bmarker_clusters%5D=1&map%5Boptions%5D%5Bforce_fit_bounds%5D=0&distance=0&is_mile=0&zoom=15&perpage=16&scroll_list=0&feature=1&featured_only=0&hide_searchbox=0&hide_nav=0&hide_nav_views=0&hide_pager=0&template=&addons=&grid_columns=4&sort=title'}); return false;"]

In [4]: url = re.findall(r'url\:\'(.*)\'\}', next_page_data[0])

In [5]: url 
Out[5]: [u'http://intheloop.com.sg/sabai/directory?p=2&category=1&map%5Bdisable%5D=0&map%5Bheight%5D=500&map%5Blist_height%5D=500&map%5Bspan%5D=5&map%5Bstyle%5D=&map%5Blist_show%5D=0&map%5Blisting_default_zoom%5D=15&map%5Boptions%5D%5Bscrollwheel%5D=0&map%5Boptions%5D%5Bmarker_clusters%5D=1&map%5Boptions%5D%5Bforce_fit_bounds%5D=0&distance=0&is_mile=0&zoom=15&perpage=16&scroll_list=0&feature=1&featured_only=0&hide_searchbox=0&hide_nav=0&hide_nav_views=0&hide_pager=0&template=&addons=&grid_columns=4&sort=title']

In [6]: nex_page_url = url[0] + '&__ajax=%23sabai-directory-listings&_='

In [7]: nex_page_url 
Out[7]: u'http://intheloop.com.sg/sabai/directory?p=2&category=1&map%5Bdisable%5D=0&map%5Bheight%5D=500&map%5Blist_height%5D=500&map%5Bspan%5D=5&map%5Bstyle%5D=&map%5Blist_show%5D=0&map%5Blisting_default_zoom%5D=15&map%5Boptions%5D%5Bscrollwheel%5D=0&map%5Boptions%5D%5Bmarker_clusters%5D=1&map%5Boptions%5D%5Bforce_fit_bounds%5D=0&distance=0&is_mile=0&zoom=15&perpage=16&scroll_list=0&feature=1&featured_only=0&hide_searchbox=0&hide_nav=0&hide_nav_views=0&hide_pager=0&template=&addons=&grid_columns=4&sort=title&__ajax=%23sabai-directory-listings&_='

在AJAX请求中，您可以在url的末尾找到两个额外参数__ajax和_，附加这些额外参数将为您提供正确的url。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章