Scrapy中if语句不起作用

2 投票

3 回答

1665 浏览

提问于 2025-04-18 02:39

我用 scrapy 搭建了一个爬虫，目的是从网站地图中抓取信息，并从所有链接中提取需要的内容。

class MySpider(SitemapSpider):
 name = "functie"
 allowed_domains = ["xyz.nl"]
 sitemap_urls = ["http://www.xyz.nl/sitemap.xml"] 

 def parse(self, response): 
  item = MyItem()
  sel = Selector(response)

  item['url'] = response.url
  item['h1'] = sel.xpath("//h1[@class='no-bd']/text()").extract()
  item['jobtype'] = sel.xpath('//input[@name=".Keyword"]/@value').extract()
  item['count'] = sel.xpath('//input[@name="Count"]/@value').extract()
  item['location'] = sel.xpath('//input[@name="Location"]/@value').extract()
  yield item

有时候，item['location'] 可能会是空值。在这种情况下，我想抓取其他内容，并把它存储到 item['location'] 里。我尝试的代码是：

item['location'] = sel.xpath('//input[@name="Location"]/@value').extract()
if not item['location']:
 item['location'] = sel.xpath('//a[@class="location"]/text()').extract()

但是它没有检查 if-condition，如果位置的输入字段为空，它就会返回空值。任何帮助都会非常有用。

数据提取条件语句 scrapy 空值处理爬虫信息抓取网站地图

3 个回答

我觉得你想要实现的功能，最好的办法是用一个自定义的项目管道来解决。

1) 首先，打开 pipelines.py 文件，在一个管道类中检查你想要的条件：

class LocPipeline(object):
    def process_item(self, item, spider):
        # check if key "location" is in item dict
        if not item.get("location"):
            # if not, try specific xpath
            item['location'] = sel.xpath('//a[@class="location"]/text()').extract()
        else:
            # if location was already found, do nothing
            pass

        return item

2) 接下来，你需要把自定义的 LocPipeline() 添加到你的 settings.py 文件里：

ITEM_PIPELINES = {'myproject.pipelines.LocPipeline': 300}

把这个自定义管道加到设置里后，scrapy 会在 MySpider().parse() 之后自动调用 LocPipeline().process_item()，如果还没有找到位置，就会去寻找备用的 XPath。

回答于 2025-04-18 由 Python大师

分享举报

你可能想检查一下 item['location'] 的长度。

item['location'] = sel.xpath('//input[@name="Location"]/@value').extract()
if len(item['location']) < 1:
    item['location'] = sel.xpath(//a[@class="location"]/text()').extract()')

不管怎样，你有没有想过用 | 把两个 xpath 合并起来呢？

item['location'] = sel.xpath('//input[@name="Location"]/@value | //a[@class="location"]/text()').extract()'

回答于 2025-04-18 由 Python大师

分享举报

试试这个方法：

if(item[location]==""):
     item['location'] = sel.xpath('//a[@class="location"]/text()').extract()

回答于 2025-04-18 由 Python大师

分享举报

Scrapy中if语句不起作用

3 个回答

撰写回答