擅长:python、mysql、java
<p>要在html正文中获取URL,最好使用<a href="https://doc.scrapy.org/en/latest/topics/link-extractors.html" rel="nofollow noreferrer">^{<cd1>}</a>:</p>
<pre><code>from scrapy.linkextractors import LinkExtractor
...
le = LinkExtractor(allow='^/(?:[^/]+/){2}[^/]+/$') # for links with 2 slashes
all_links = le.extract_links(response) # all links matching the `allow` regex.
...
</code></pre>
<p>您还可以在LinkExtractor中保留including规则,以实际匹配更好的链接。你知道吗</p>