错误：无法绑定：24:打开的文件太多

class JakeSpider(CrawlSpider): name = 'jake' allowed_domains = allowedDomains start_urls = startUrls rules = ( Rule(LinkExtractor(), callback='parse_page', follow=True), ) def parse_page(self, response): page = response.url domain = urlparse(page).netloc domain = domain.replace('www.','') #print(domain, 'is domain and page is', page) linksToGet = getHotelUrlsForDomain(domain) #if(len(linksToGet) == 0): # print('\n ... links to get was zero \n') #print('linksToGet = ', linksToGet) links = response.xpath('//a/@href').extract() for link in links: if link in linksToGet: print('\n\n\n found one! ', link, 'is on', domain, ' and the page is', page,'\n\n\n') with open('hotelBacklinks.csv', 'a') as csvfile: writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writerow({'hotelURL':link, 'targetDomain': domain})

1条回答

网友

1楼 · 发布于 2024-04-25 14:06:02

您应该使用pipeline来保存所有刮取的数据。在
出现此错误是因为有许多调用函数parse_page。每个函数都试图打开并写入同一个文件。写入文件是块操作这是刮胡子的医生https://doc.scrapy.org/en/latest/topics/item-pipeline.html

相关问题更多 >

编程相关推荐

热门问题

热门文章