如何从页面中获取特殊的url？

<div class="views-row views-row-2 views-row-even"> <span class="views-field views-field-title"> <span class="field-content"> <a href="http://simile.mit.edu/wiki/Babel" target="_blank">babel</a> </span> </span> <span class="views-field views-field-nothing"> <span class="field-content"><a href="http://openinnovation.cn/node/9506">详细信息</a> </span> </span> </div>

def main(): for j in range(58,64): listURL = 'http://www.openinnovation.cn/opentools/function/'+str(j) listPage = urllib.urlopen(listURL) listhtml = listPage.read() page_html = lxml.html.fromstring(listhtml) # get the information page url from the list page: #infoURL = page_html.cssselect("a.ttrib['href']") infoURL = page_html.cssselect(".views-field views-field-nothing, .field-content, a.attrib['href']") for e in infoURL: print e

1条回答

网友

1楼 · 发布于 2024-04-19 23:42:40

取决于选择节点的具体程度，您可以使用

.views-row > span:nth-of-type(2) a

选择第二个跨距中的链接或

a[href*='//openinnovation.cn/node/']

选择在其href属性中包含特定字符串的所有链接。这将使用attribute*='string'属性选择器，您可以阅读有关here的更多信息。CSS没有XPath强大，因此不能直接选择href属性。必须使用lxml API从e显式获取属性：

infoURLs = page_html.cssselect("a[href*='//openinnovation.cn/node/']")
for urlNode in infoURLs:
    print urlNode.get("href")

相关问题更多 >

编程相关推荐

热门问题

热门文章