python无法获取正确的页面源代码

2024-03-28 14:24:30 发布

您现在位置:Python中文网/ 问答频道 /正文

例如,我很难获得正确的网页源代码。在这个链接上,例如http://www.ebay.com/sch/Kitchen-Tools-Gadgets/20635/i.html?_from=R40&LH_ItemCondition=3&LH_BIN=1&LH_FS=1&LH_RPA=1&_mPrRngCbx=1&_udlo=&_udhi=50%22&_nkw=slicer&LH_PrefLoc=3&_pgn=2&_skc=200&rt=nc我在浏览器和python获取源代码时看到了不同的代码。我已经试过urllib2了

usock = urllib2.urlopen(url).read()
page = BeautifulSoup(usock, "html.parser")

另外,我也尝试了幻影JS和这样的代码

driver = webdriver.PhantomJS()
driver.get(url)
content = driver.page_source

我的程序告诉我这个代码

<li _sp="p2045573.m1686.l13" class="sresult lvresult clearfix li shic" id="item41a7f34546" listingid="281990612294" r="1">
<div class="lvpic pic img left" iid="281990612294">
<div class="lvpicinner full-width picW">
<div class="triangle"></div>
<div class="urgency"></div>
<a class="img imgWr2" href="http://www.ebay.com/itm/Philips-Viva-Collection-HR2505-90-Black-OnionChef-2-way-Slicer-GENUINE-NEW-/281990612294?hash=item41a7f34546:g:YJ4AAOSwgApXANo~">
<img alt="Philips Viva Collection HR2505/90 Black OnionChef  2-way Slicer GENUINE NEW" class="img" src="http://thumbs.ebaystatic.com/images/g/YJ4AAOSwgApXANo~/s-l225.jpg"/>
</a>

当我在网站上按“查看网页源代码”

<li id="item3ab2772306" _sp="p2045573.m1686.l74" listingId="252102255366" class="sresult lvresult clearfix li shic"
    r="1" >

    <div class="lvpic pic img left" iid="252102255366" >
            <div class="lvpicinner full-width picW">

    <a href="http://www.ebay.com/itm/12-PC-Super-Slicer-Plus-Vegetable-Fruit-Peeler-Dicer-Cutter-Chopper-Nicer-Grater-/252102255366?hash=item3ab2772306:g:B7kAAOSw9r1WA89h" class="img imgWr2">
                     <img  
                        src="http://thumbs.ebaystatic.com/images/g/B7kAAOSw9r1WA89h/s-l225.jpg" class="img" alt='12 PC Super Slicer Plus Vegetable Fruit Peeler Dicer Cutter Chopper Nicer Grater' />
                </a>
            </div></div>
    <h3 class="lvtitle"><a href="http://www.ebay.com/itm/12-PC-Super-Slicer-Plus-Vegetable-Fruit-Peeler-Dicer-Cutter-Chopper-Nicer-Grater-/252102255366?hash=item3ab2772306:g:B7kAAOSw9r1WA89h"  class="vip visited" title="Click this link to access 12 PC Super Slicer Plus Vegetable Fruit Peeler Dicer Cutter Chopper Nicer Grater">12 PC Super Slicer Plus Vegetable Fruit Peeler Dicer Cutter Chopper Nicer Grater</a>

当然,这可能取决于搜索结果,但在我的测试中,我从来没有得到正确的结果


Tags: divcomhttpimgplusslicerclasscutter
1条回答
网友
1楼 · 发布于 2024-03-28 14:24:30

问题在于使用JavaScript动态加载的代码。您构建的scraper不呈现JavaScript。使用PhantomJS和selenium来修复。你知道吗

相关问题 更多 >