<p>你的错误是因为你使用了错误的号码从你的分裂。你想要-1.观察:</p>
<pre><code>last_page = soup.find('ul', class_='pagination').find('li', class_='last').a['href']
print(last_page)
print(last_page.split('=')[1])
print(last_page.split('=')[-1])
</code></pre>
<p>提供:</p>
<pre><code>/search/Contributions?endDate=2019-07-11&searchTerm=%22climate+change%22&startDate=1800-01-01&page=966
</code></pre>
<p>拆分时使用1</p>
<pre><code>2019-07-11&searchTerm
</code></pre>
<p>对-1</p>
<pre><code>966
</code></pre>
<p>为了从你想要的每一个页面获得信息,我会像其他答案一样使用css选择器和压缩。下面是一些其他的循环构造,在给定请求数的情况下使用会话来提高效率</p>
<hr/>
<p>您可以发出初始请求并提取页数,然后循环这些页数。使用会话对象以提高连接重用的效率</p>
<pre><code>import requests
from bs4 import BeautifulSoup as bs
def make_soup(s, page):
page_url = "https://hansard.parliament.uk/search/Contributions?endDate=2019-07-11&page={}&searchTerm=%22climate+change%22&startDate=1800-01-01&partial=True"
r = s.get(page_url.format(page))
soup = bs(r.content, 'lxml')
return soup
with requests.Session() as s:
soup = make_soup(s, 1)
pages = int(soup.select_one('.last a')['href'].split('page=')[1])
for page in range(2, pages + 1):
soup = make_soup(s, page)
#do something with soup
</code></pre>
<hr/>
<p>可以循环直到类<code>last</code>停止出现</p>
<pre><code>import requests
from bs4 import BeautifulSoup as bs
present = True
page = 1
#results = {}
def make_soup(s, page):
page_url = "https://hansard.parliament.uk/search/Contributions?endDate=2019-07-11&page={}&searchTerm=%22climate+change%22&startDate=1800-01-01&partial=True"
r = s.get(page_url.format(page))
soup = bs(r.content, 'lxml')
return soup
with requests.Session() as s:
while present:
soup = make_soup(s, page)
present = len(soup.select('.last')) > 0
#results[page] = soup.select_one('.pagination-total').text
#extract info
page+=1
</code></pre>