下面是scrapy的一个执行过程,它用scrape的url填充dynamodb。我得到了一个错误:
AttributeError: 'dict' object has no attribute 'urljoin'
但是,我不清楚原因。在
##############################################
# Script: Prep storage for chemtrail #
# Author: James #
# Purpose: #
# Version: #
# #
##############################################
import boto3
import json
import scrapy
class ChemPrepSpider(scrapy.Spider):
name = "xxxxxx"
def start_requests(self):
urls = [
'https://www.xxxxxxx.com.au'
]
for url in urls:
yield scrapy.Request(url=url, callback=self.parse)
def parse(self,response):
dynamodb = boto3.resource('dynamodb', region_name='ap-southeast-2')
table = dynamodb.Table('chemTrailStorage')
category_links = response.css('li').xpath('a/@href').getall()
category_links_filtered = [x for x in category_links if 'shop-online' in x] # remove non category links
category_links_filtered = list(dict.fromkeys(category_links_filtered)) # remove duplicates
for category_link in category_links_filtered:
print('raw category -> ' + category_link)
next_category = response.urljoin(category_link) + '?size=99999'
print('DynamoDb insert for category: ' + next_category)
response = table.put_item(
Item={
'CategoryPath': next_category,
'ItemCount':"99999",
'JobStat':"NOT_STARTED",
'PickupDateTime':"NA",
'CompletionDateTime':"NA"
}
)
print('Response from put....')
print(response)
似乎boto3从table.put_项命令-请参阅AWS boto3documentation。在
这意味着您要用一个“Dict”覆盖这个难看的“response”对象,它没有urljoin属性。在
你应该把“response=table.put_项“带发电机响应=table.put_项在
或者你选择的其他名字。在
相关问题 更多 >
编程相关推荐