xpath选择的lxml的新成员，结果太多

import lxml.html def readHTML(arg): ret = "" ret = lxml.html.parse(arg) return ret soup = (readHTML("http://www.myScrapingSite.com/")) subGroup = soup.xpath("//div[@class='colmask']")[0] #i want this to only be the cities in subGroup but its #giving me the cities on the entire page..what am I doing wrong? cities = subGroup.xpath('//li/a') urls = {} #so basically I am building a dictionary that is a superset of the desired set for city in cities: print city.attrib['href'] urls[city.attrib['href']] = 1 for url in urls: subGroup2 = readHTML(url)

1条回答

网友

1楼 · 发布于 2024-04-24 09:30:05

问题是//表示相对于文档根，即使对于子组也是如此。您真正想要的可能是相对于当前节点的.//

cities = subGroup.xpath('.//li/a')

下面是一个例子

>> xmlString = '<root><taga name="a"><tagb name="first"/></taga><taga name="b"><tagb name="second"/></taga></root>'
>> xml = lxml.etree.fromstring(xmlString)
>> taga = x.xpath('//taga[@name="a"]')[0]
>> taga[0].xpath('//tagb')
[<Element tagb at 7fddaa625310>, <Element tagb at 7fddaa6252b8>]
>> taga[0].xpath('.//tagb')
[<Element tagb at 7fddaa625310>]

您可以看到//返回两个tagb条目，而.//只返回当前节点内的条目。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章