用肉末和硒刮去分页内容物

class HotelsSpider(CrawlSpider): name = 'hotels' allowed_domains = ['lastsecond.ir'] start_urls = ['http://lastsecond.ir/hotels'] rules = ( Rule(LinkExtractor(allow=r'/hotels\?page=[0-9]/'), callback='parse_item', follow=True), ) def __init__(self, *args, **kwargs): super(HotelsSpider, self).__init__(*args, **kwargs) self.driver = webdriver.Chrome(executable_path='chromedriver.exe') def parse_item(self, response): self.driver.get("http://lastsecond.ir/hotels?page=1") WebDriverWait(self.driver, 30).until( EC.presence_of_element_located((By.ID, "panel1")) ) response = TextResponse(url=response.url, body=self.driver.page_source, encoding='utf-8') hotel = ItemLoader(item=HotelItem(), response=response) hotel.add_css('hotel_name', '#panel1 h2.semimedium-font-size a::text') return hotel.load_item()

1条回答

网友

1楼 · 发布于 2024-05-28 18:48:20

Token位于http://lastsecond.ir/hotels中的JavaScript代码中，即

var csrftoken = 'P7E5Txa5GGmMdJaEf6Y99RsD24vlzD74zEqKg83f';

所以可以使用标准字符串函数来获取它。在

如果您有令牌，那么您可以使用FormRequest()创建对POST的POST请求，而不需要Selenium

在FormRequest()中使用dont_filter=True，因为它将多次执行同一个url，而scrapy通常会跳过重复的url。在

^{pr2}$

部分结果显示在屏幕上。{所有数据都保存在^中。在

page: 1

keys: dict_keys(['hotels', 'pagination', 'grades', 'locations', 'scores'])
keys[hotels]: dict_keys(['id', 'title_fa', 'title_en', 'link', 'logo_link', 'decorated_grade', 'location', 'rank', 'is_recommended_percent', 'decorated_score', 'reviews_count'])

title_en: Heliya Kish hotel
title_en: Amara Prestige Elite
title_en: All Seasons Hotel
title_en: Hotel Grand Unal
title_en: Marmaray hotel
title_en: Nova Plaza Taksim Square
title_en: Flora Grand Hotel
title_en: Boulevard Autograph Collection hotel
title_en: Alfa Istanbul hotel
title_en: Ramada Hotel & Suites Istanbul Merter
title_en: Sabena hotel
title_en: Taksim Gonen
title_en: Fame Residence Lara & SPA
title_en: Palazzo Donizetti Hotel
title_en: Twin Towers hotel
title_en: Grand Hotel de Pera hotel
title_en: Grand Hotel Halic
title_en: Grand Pamir hotel
title_en: St George hotel
title_en: The Royal Paradise hotel

page: 2

keys: dict_keys(['hotels', 'pagination', 'grades', 'locations', 'scores'])
keys[hotels]: dict_keys(['id', 'title_fa', 'title_en', 'link', 'logo_link', 'decorated_grade', 'location', 'rank', 'is_recommended_percent', 'decorated_score', 'reviews_count'])

title_en: Radisson Royal moscow hotel
title_en: Avenue hotel
title_en: jamshid esfahan hotel
title_en: Aquatek hotel
title_en: Adalya Elite Lara
title_en: Federal Kuala Lumpur hotel
title_en: Feronya Hotel
title_en: Dolabauri Tbilisi hotel
title_en: Limak Limra hotel
title_en: Urban Boutique Hotel
title_en: Doubletree Hilton Piyalepasa hotel
title_en: Ferman Hilal hotel
title_en: Grand Oztanik Hotel
title_en: Lara Family Club hotel
title_en: Swissotel The Bosphorus
title_en: Berjaya Times Square hotel
title_en: Gardenia hotel
title_en: Rixos Sungate
title_en: Jumeirah Emirates Towers hotel
title_en: Kervansaray Lara Hotel

相关问题更多 >

编程相关推荐

热门问题

热门文章