Scrapy爬虫遇到错误"'str'对象没有'iter'属性
我收到一个错误信息,上面写着:
AttributeError: 'str' object has no attribute 'iter'
2024-03-15 14:01:19 [scrapy.core.engine] INFO: Closing spider (finished)
当我使用这段代码的时候:
class AuctionSpider(CrawlSpider):
name = "auction"
allowed_domains = ["auct.co.th"]
start_urls = ["https://www.auct.co.th/products"]
rules = (Rule(LinkExtractor(restrict_xpaths="//div[@class='p-2 card']/text()"), callback="parse_item", follow=True),)
def parse_item(self, response):
yield {
'auction_date': response.xpath("//b[@id ='product_auction_date']/text()").get(),
'price_start': response.xpath("//b[@id ='product_price_start']/text()").get(),
'order': response.xpath("//b[@id ='product_order']/b/text()").get(),
'product_title': response.xpath("//div[@class ='col-md-12']/b/text()").get(),
'product_regis_id': response.xpath("//div[@class ='col-sm-12 col-md-12 col-xl-12']/b/text()").get(),
'total_drive': response.xpath("//b[@id='product_total_drive']/text()").get(),
'product_gear': response.xpath("//b[@id='product_gear']/text()").get(),
'product_color': response.xpath("//b[@id='product_color']/text()").get(),
'cc': response.xpath("//b[@id='product_engin_cc']/text()").get(),
'regis_year': response.xpath("//b[@id='product_regis_year']/text()").get(),
'build_year': response.xpath("//b[@id='product_build_year']/text()").get(),
'gas_type': response.xpath("//b[@id='product_gas_type']/text()").get(),
'vin_no': response.xpath("//b[@id='product_body_number']/text()").get(),
'engine_no': response.xpath("//b[@id='product_engin_number']/text()").get(),
'endtax': response.xpath("//b[@id='product_endtax']/text()").get(),
'stock': response.xpath("//b[@id='product_oderstock']/text()").get(),
'price': response.xpath("//b[@id='product_price_other']/text()").get(),
'gadget': response.xpath("//b[@id='product_gadget']/text()").get(),
'remark': response.xpath("//b[@id='product_remark']/text()").get(),
}
这个错误说“'str'对象没有'iter'这个属性”。
这是什么原因呢?我该怎么解决这个问题呢?
1 个回答
1
这个问题出现是因为你特意在找一个字符串:(/text()
)
restrict_xpaths="//div[@class='p-2 card']/text()"
你需要把它换成实际包含链接的标签的xpath选择器,像这样:
rules = (Rule(LinkExtractor(restrict_xpaths="//div[@class='p-2 card']//a"), callback="parse_item", follow=True),)
由于某种原因,我没有产品,所以我得到的输出是:
{'auction_date': None, 'price_start': None, 'order': None, 'product_title': '-', 'product_regis_id': '-', 'total_drive': None, 'product_gear': None, 'product_color': None, 'cc': None, 'regis_year': None, 'build_year': None, 'gas_type': None, 'vin_no': None, 'engine_no': None, 'endtax': None, 'stock': None, 'price': None, 'gadget': None, 'remark': None}