Python scrapy script AttributeError:“dict”对象没有属性“urljoin”

2024-04-19 18:15:38 发布

您现在位置:Python中文网/ 问答频道 /正文

下面是scrapy的一个执行过程,它用scrape的url填充dynamodb。我得到了一个错误:

AttributeError: 'dict' object has no attribute 'urljoin'

但是,我不清楚原因。在

##############################################
#  Script:  Prep storage for chemtrail       #
#  Author: James                             #
#  Purpose:                                  #
#  Version:                                  #
#                                            #
##############################################
import boto3
import json
import scrapy

class ChemPrepSpider(scrapy.Spider):
    name = "xxxxxx"

    def start_requests(self):
        urls = [
            'https://www.xxxxxxx.com.au'
        ]

        for url in urls:
            yield scrapy.Request(url=url, callback=self.parse)

    def parse(self,response):
        dynamodb = boto3.resource('dynamodb', region_name='ap-southeast-2')
        table = dynamodb.Table('chemTrailStorage')
        category_links = response.css('li').xpath('a/@href').getall()
        category_links_filtered = [x for x in category_links if 'shop-online' in x] # remove non category links
        category_links_filtered = list(dict.fromkeys(category_links_filtered)) # remove duplicates 

        for category_link in category_links_filtered:
            print('raw category -> ' + category_link)
            next_category = response.urljoin(category_link) + '?size=99999'
            print('DynamoDb insert for category: ' + next_category)
            response = table.put_item(
                Item={
                    'CategoryPath': next_category,
                    'ItemCount':"99999",
                    'JobStat':"NOT_STARTED",
                    'PickupDateTime':"NA",
                    'CompletionDateTime':"NA"
                }
            )
            print('Response from put....')
            print(response)

Tags: inimportselfurlforresponselinklinks
1条回答
网友
1楼 · 发布于 2024-04-19 18:15:38

似乎boto3从table.put_项命令-请参阅AWS boto3documentation。在

这意味着您要用一个“Dict”覆盖这个难看的“response”对象,它没有urljoin属性。在

你应该把“response=table.put_项“带发电机响应=table.put_项在

或者你选择的其他名字。在

相关问题 更多 >