分页链接的刮取项不是通常的h

import requests from lxml import html url="http://www.findanarchitect.com.au/index.php" def Endpoint(Address): payload = {'action':'show_search_result','action_spam':'dDfgEr','txtSearchType':5,'txtPracName':'','optSstate':3,'optRegions':23,'txtPcode':'','txtShowBuildingType':0,'optBuildingType':1,'optHomeType':1,'optBudget':''} response = requests.post(Address, data = payload) tree=html.fromstring(response.text) titles=tree.xpath('//div[@id="pagination"]') for title in titles: Links=title.xpath('.//li[@class]/a/@href') for Link in Links: print(Link) Endpoint(url)

1条回答

网友

1楼 · 发布于 2024-04-24 12:01:16

在html上有js_go_to_page函数

/*
* Go to Page
*/
function js_goto_page(page_no)
{
    $('#idCurPageNo').val(page_no);
    action = "action=ajax_goto_page";
    furl = '/index.php?'+action+'&page_no='+page_no+'&search_type='+$('#idSubSearchType').val();
    $.ajax({
            type: "GET",
            url:furl,
            cache :false,
            async:false,
            dataType:'json',
            success: function(data)
                    {
                        $('#archWrapper').html(data.html);
                        $('#pagination_bottom').html(data.pagination_tab); 
                        //$("html").animate({ scrollTop: 0 });
                        $("html").scrollTop(0);

                    }
        });
}

您需要在爬虫程序代码中重新生成furl变量的内容，这一点很有挑战性，因为action是静态的，page_no是要获取的页面的编号，$('#idSubSearchType').val()可以用html解析器读取。在

相关问题更多 >

编程相关推荐

热门问题

热门文章