<p>这是因为scrapy不等待javascript加载,您需要使用<a href="https://github.com/scrapy-plugins/scrapy-splash" rel="nofollow noreferrer">scrapy-splash</a>,<a href="https://stackoverflow.com/questions/50884181/scrapy-splash-active-content-selector-works-in-shell-but-not-with-spider">here is my answer how you need to setup</a>您的scrapy项目与<code>scrapy-splash</code></p>
<p>如果我使用<code>scrapy-splash</code>,我会得到结果</p>
<pre><code>2018-06-30 20:50:21 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://itunes.apple.com/us/album/no-tears-left-to-cry/1374085537?i=1374087460&v0=WWW-NAUS-ITSTOP100-SONGS&l=en&ign-mpt=uo%3D4%27 via http://localhost:8050/render.html> (referer: None)
2018-06-30 20:50:21 [scrapy.core.scraper] DEBUG: Scraped from <200 https://itunes.apple.com/us/album/no-tears-left-to-cry/1374085537?i=1374087460&v0=WWW-NAUS-ITSTOP100-SONGS&l=en&ign-mpt=uo%3D4%27>
{'title': 'no tears left to cry - Single'}
</code></pre>
<p>这是我的简单蜘蛛</p>
<pre><code>import scrapy
from scrapy_splash import SplashRequest
class TestSpider(scrapy.Spider):
name = "test"
start_urls = ['https://itunes.apple.com/us/album/no-tears-left-to-cry/1374085537?i=1374087460&v0=WWW-NAUS-ITSTOP100-SONGS&l=en&ign-mpt=uo%3D4%27']
def start_requests(self):
for url in self.start_urls:
yield SplashRequest(url=url,
callback=self.parse,
endpoint='render.html',
)
def parse(self, response):
yield {'title': response.xpath('//*[@id="ember653"]/section[1]/div/div[2]/div[1]/div[2]/header/h1//text()').extract_first()}
</code></pre>
<p>你也可以用<code>scrapy shell</code>来做这件事</p>
<pre><code>scrapy shell 'http://localhost:8050/render.html?url=https://itunes.apple.com/us/album/no-tears-left-to-cry/1374085537?i=1374087460&v0=WWW-NAUS-ITSTOP100-SONGS&l=en&ign-mpt=uo%3D4'
In [2]: response.xpath('//*[@id="ember653"]/section[1]/div/div[2]/div[1]/div[2]/header/h1//text()').extract_first()
Out[2]: 'no tears left to cry - Single'
</code></pre>