使用scrapy从xml中提取链接

2条回答

网友

1楼 · 编辑于 2024-04-20 03:33:08

这里的关键问题是，这不是一个常规的HTML输入，而是一个XML提要，链接位于元素texts中，而不是属性中。我想你只需要这里的^{}：

import scrapy
from scrapy.spiders import XMLFeedSpider

class MySpider(XMLFeedSpider):
    name = 'myspider'
    start_urls = ['url_here']

    itertag = "item"

    def parse_node(self, response, node):
        for link in node.xpath(".//link/text()").extract():
            yield scrapy.Request(link.strip(), callback=self.parse_link)

    def parse_link(self, response):
        print(response.url)

网友

2楼 · 编辑于 2024-04-20 03:33:08

你应该用xml.etree图书馆。在

import xml.etree.ElementTree as ET



res = '''
<item>
  <pubDate>Sat, 12 Dec 2015 16:35:00 GMT</pubDate>
  <title>
   some text
  </title>
  <link>
     http://www.example.com/index.xml
  </link>
</item>
'''

root = ET.fromstring(res)
results = root.findall('.//link')
for res in results:
    print res.text

输出如下：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用scrapy从xml中提取链接

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >