scrapysplash活动内容选择器可以在shell中工作，但不能与spid一起使用

$ scrapy shell 'http://localhost:8050/render.html?url=https://www.opentable.com/new-york-restaurant-listings&timeout=10&wait=0.5' ... In [1]: response.css('div.booking::text').extract() Out[1]: ['Booked 59 times today', 'Booked 20 times today', 'Booked 17 times today', 'Booked 29 times today', 'Booked 29 times today', ... ]

2条回答

网友

1楼 · 编辑于 2024-05-29 03:01:18

{1>你需要先考虑一下你的问题

# settings.py

# uncomment `DOWNLOADER_MIDDLEWARES` and add this settings to it
DOWNLOADER_MIDDLEWARES = {
    'scrapy_splash.SplashCookiesMiddleware': 723,
    'scrapy_splash.SplashMiddleware': 725,
    'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}

# url of splash server
SPLASH_URL = 'http://localhost:8050'

# and some splash variables
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter'
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'

现在运行docker

^{pr2}$

如果我做了所有这些步骤，我会回来：

scrapy crawl opentable

...

2018-06-23 11:23:54 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.opentable.com/new-york-restaurant-listings via http://localhost:8050/render.html> (referer: None)
2018-06-23 11:23:54 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.opentable.com/new-york-restaurant-listings>
{'bookings': [
    'Booked 44 times today',
    'Booked 24 times today',
    'and many others Booked values'
]}

网友

2楼 · 编辑于 2024-05-29 03:01:18

这不起作用，因为这个web内容使用JS。在

您可以采用以下几种解决方案：

1）使用硒。在

2）如果您看到页面的API，如果您调用此url <GET https://www.opentable.com/injector/stats/v1/restaurants/<restaurant_id>/reservations>，您将获得该特定餐厅的当前预订数量（餐厅编号）。在

相关问题更多 >

编程相关推荐

热门问题

热门文章