<p>首先,你需要获得你想要搜索链接的网页的内容。我强烈建议使用<a href="https://2.python-requests.org//en/master/" rel="nofollow noreferrer">requests</a>,这是一个简单的Python HTTP库:</p>
<pre class="lang-py prettyprint-override"><code>import requests
response = request.get(https://www.stubhub.com/new-york-rangers-tickets/performer/2764/)
</code></pre>
<p>由于某些原因,此特定URL需要用户代理标头,因此您应在请求时发送一个标头:</p>
<pre class="lang-py prettyprint-override"><code>url = 'https://www.stubhub.com/new-york-rangers-tickets/performer/2764/'
user_agent = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0'
response = requests.get(url, headers={'User-Agent':user_agent})
</code></pre>
<p>然后可以使用<a href="https://www.crummy.com/software/BeautifulSoup/bs4/doc/" rel="nofollow noreferrer">beautifulsoup4</a>开始分析页面内容。可以使用方法<code>find_all</code>将编译后的正则表达式作为<code>text</code>参数传递,以查找包含特定文本的所有<code>a</code>标记:</p>
<pre class="lang-py prettyprint-override"><code>from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(response.content, "html.parser")
rangers_anchor_tags = soup.find_all("a", text=re.compile(r".*\bNew York Rangers at\b.*")
urls = [anchor["href"] for anchor in rangers_anchor_tags]
</code></pre>
<p><code>urls</code>则是一个URL列表,锚定标记的相应内部文本包含有问题的字符串</p>