在循环中与“dot”一起使用时,相对xpath不起作用

2024-04-19 17:24:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我对Python和Scrapy还很陌生。所以我创建了一个蜘蛛,我有相对路径的问题。如果我在循环中不使用'dot',只要循环运行,它就会打印相同的结果,但是如果我在循环中使用'dot',它会显示它已经刮去了,但是文本是空白的。你知道吗

import scrapy
from demo_proj.items import JokeItem
from scrapy.loader import ItemLoader
from scrapy import Selector


class JokesSpider(scrapy.Spider):
    name = 'jokes'
    allowed_domains=['kitco.com']
    start_urls = [
        'https://www.kitco.com/'
    ]


    def parse(self, response):
        for joke in response.xpath("//div[@class='top15']"):
            l=ItemLoader(item=JokeItem(),selector=joke)
            l.add_xpath('news',".//div[@class='top15']/a/h3")
            l.add_xpath('time',".//div[@class='top15']/span[@class='post-date']")
            l.add_xpath('source',".//div[@class='top15']/span[@class='source']")
            yield l.load_item()

Tags: fromimportdivcomaddresponsexpathdot
1条回答
网友
1楼 · 发布于 2024-04-19 17:24:45

//div[@class='top15']谓词在for循环中是额外的。在进入for循环之前,你把范围缩小到了。蜘蛛是:

class JokesSpider(scrapy.Spider):
    name = 'jokes'
    allowed_domains=['kitco.com']
    start_urls = [
        'https://www.kitco.com/'
    ]

    def parse(self, response):
        for joke in response.xpath("//div[@class='top15']"):
            l = ItemLoader(item=JokeItem(), selector=joke)
            l.add_xpath('news', "./a/h3/text()")
            l.add_xpath('time', "./span[@class='post-date']/text()")
            l.add_xpath('source', "./span[@class='source']/text()")
            yield l.load_item()

items.py将是:

class JokeItem(scrapy.Item):
    news = scrapy.Field()
    time = scrapy.Field()
    source = scrapy.Field()

这是我日志中的几行:

{'news': ['The real gold price rally hasn’t even started yet, says analyst who '
          '...'],
 'source': ['Kitco Video News'],
 'time': ['Dec  9']}
2019-12-10 10:08:20 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.kitco.com/>
{'news': ['Who will win the 2020 presidential election? Doug Casey weighs in '
          'on ...'],
 'source': ['Kitco News'],
 'time': ['Dec  9']}
2019-12-10 10:08:20 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.kitco.com/>
{'news': ['What kind of a gold investor are you?'],
 'source': ['Kitco News'],
 'time': ['Dec  9']}

相关问题 更多 >