我正试图重写这段代码以使用ItemLoader
类:
import scrapy
from ..items import Book
class BasicSpider(scrapy.Spider):
...
def parse(self, response):
item = Book()
# notice I only grab the first book among many there are on the page
item['title'] = response.xpath('//*[@class="link linkWithHash detailsLink"]/@title')[0].extract()
return item
上述方法非常有效。现在与ItemLoader
相同:
from scrapy.loader import ItemLoader
class BasicSpider(scrapy.Spider):
...
def parse(self, response):
l = ItemLoader(item=Book(), response=response)
l.add_xpath('title', '//*[@class="link linkWithHash detailsLink"]/@title'[0]) # this does not work - returns an empty dict
# l.add_xpath('title', '//*[@class="link linkWithHash detailsLink"]/@title') # this of course work but returns every book title there is on page, not just the first one which is required
return l.load_item()
所以我只想抢到第一本书的书名,我怎么做到的?你知道吗
代码中的一个问题是Xpath使用基于一个的索引。另一个问题是索引括号应该在传递给add\uxpath方法的字符串中。你知道吗
所以正确的代码如下所示:
相关问题 更多 >
编程相关推荐