使用xpath获取src属性

2024-04-25 21:56:35 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用python和'requests'和'lxml'模块来创建一个解析的HTML对象。 我的任务是在下一页中找到包含字符串“googledservices”的所有链接:

http://www.euronews.com/2015/03/20/uber-taxis-overtake-new-york-yellow-cabs/

我的xpath查询是

//script[contains(@src,'google')]/@src

我认为它应该返回每个脚本节点的src属性的值,但是它失败了,因为我看到了以下查询结果:

^{pr2}$

请注意:

http://partner.googleadservices.com/gpt/pubads_impl_58.js

不见了!!!在

我想我错过了一个微妙的语法点,我很高兴能得到启发。在


Tags: 模块对象字符串srccomhttp链接html
1条回答
网友
1楼 · 发布于 2024-04-25 21:56:35

在请求发出的请求的响应中没有带有src="http://partner.googleadservices.com/gpt/pubads_impl_58.js"的脚本。它是异步加载的。在

作为一种解决方法,您可以在^{} package的帮助下使真正的浏览器自动化。在

示例(使用PhantomJS无头浏览器):

>>> from selenium import webdriver
>>> 
>>> driver = webdriver.PhantomJS()
>>> url = "http://www.euronews.com/2015/03/20/uber-taxis-overtake-new-york-yellow-cabs/"
>>> driver.get(url)
>>> for script in driver.find_elements_by_xpath("//script[contains(@src, 'google')]"):
...     print(script.get_attribute('src'))
... 
https://apis.google.com/_/scs/apps-static/_/js/k=oz.gapi.en_US.t-LxkuL3EUg.O/m=gapi_iframes_style_bubble/exm=auth,plusone,ytsubscribe/rt=j/sv=1/d=1/ed=1/am=IQ/rs=AGLTcCNAFql0FUItRCrv44X1do5tNb0b8Q/t=zcms/cb=gapi.loaded_3
https://apis.google.com/_/scs/apps-static/_/js/k=oz.gapi.en_US.t-LxkuL3EUg.O/m=auth/exm=plusone,ytsubscribe/rt=j/sv=1/d=1/ed=1/am=IQ/rs=AGLTcCNAFql0FUItRCrv44X1do5tNb0b8Q/t=zcms/cb=gapi.loaded_2
https://apis.google.com/_/scs/apps-static/_/js/k=oz.gapi.en_US.t-LxkuL3EUg.O/m=ytsubscribe/exm=plusone/rt=j/sv=1/d=1/ed=1/am=IQ/rs=AGLTcCNAFql0FUItRCrv44X1do5tNb0b8Q/t=zcms/cb=gapi.loaded_1
https://apis.google.com/_/scs/apps-static/_/js/k=oz.gapi.en_US.t-LxkuL3EUg.O/m=plusone/rt=j/sv=1/d=1/ed=1/am=IQ/rs=AGLTcCNAFql0FUItRCrv44X1do5tNb0b8Q/t=zcms/cb=gapi.loaded_0
http://www.googletagservices.com/tag/js/gpt.js
http://www.euronews.com/js/google.js
https://apis.google.com/js/plusone.js
http://partner.googleadservices.com/gpt/pubads_impl_58.js
http://pagead2.googlesyndication.com/pagead/osd.js
http://pagead2.googlesyndication.com/pagead/show_ads.js
http://pagead2.googlesyndication.com/pagead/js/r20150331/r20150224/show_ads_impl.js
http://www.googletagservices.com/tag/js/check_359604.js
http://googleads.g.doubleclick.net/pagead/ads?client=ca-pub-3977141546397241&output=js&adk=2828788313&image_size=607x90&lmt=1428369754&num_ads=4&skip=0&ad_type=text&ea=0&oe=utf8&flash=0&hl=en&url=http%3A%2F%2Fwww.euronews.com%2F2015%2F03%2F20%2Fuber-taxis-overtake-new-york-yellow-cabs%2F&dt=1428355354776&shv=r20150331&cbv=r20150224&saldr=sb&correlator=6304440702977&frm=20&ga_vid=21319259.1428355355&ga_sid=1428355355&ga_hid=935959392&ga_fc=0&u_tz=-240&u_his=1&u_java=0&u_h=900&u_w=1440&u_ah=873&u_aw=1440&u_cd=32&u_nplug=0&u_nmime=0&dff=arial&dfs=12&biw=400&bih=300&eid=317150304&oid=3&rx=0&eae=2&fc=24&brdim=0%2C0%2C0%2C0%2C1440%2C23%2C0%2C0%2C400%2C300&vis=0&rsz=0%7C0%7C%7C&abl=CS&ppjl=u&fu=1024&bc=1&ifi=1&dtd=155
>>> 

相关问题 更多 >