<p>这个页面使用<code>JavaScript</code>来检测bot/脚本,它似乎可以工作,因为它阻止了您的代码。你可能需要更多的东西</p>
<p>如果您检查repo<a href="https://github.com/psf/requests-html" rel="nofollow noreferrer">requests-html</a>,您会发现它的更新时间不超过1年</p>
<p>我可以用硒</p>
<pre><code>from selenium import webdriver
url = "https://advanced.name/freeproxy"
#driver = webdriver.Firefox()
driver = webdriver.Chrome()
driver.get(url)
all_ips = driver.find_elements_by_xpath('//td[@data-ip]')
all_ports = driver.find_elements_by_xpath('//td[@data-port]')
for ip, port in zip(all_ips, all_ports):
print(ip.text, port.text)
</code></pre>
<hr/>
<p><strong>编辑:</strong></p>
<p>阅读下一页</p>
<ul>
<li><p>使用<code>for</code>-loop和带有页码的<code>url</code>,但它需要知道有多少页</p>
<pre><code> from selenium import webdriver
#driver = webdriver.Firefox()
driver = webdriver.Chrome()
url = "https://advanced.name/freeproxy?ddexp4attempt=1&page="
for page in range(15):
print(' - page', page, ' -')
driver.get(url + str(page))
all_ips = driver.find_elements_by_xpath('//td[@data-ip]')
all_ports = driver.find_elements_by_xpath('//td[@data-port]')
for ip, port in zip(all_ips, all_ports):
print(ip.text, port.text)
</code></pre>
</li>
<li><p>使用<code>while</code>并单击链接到下一页-您不必知道有多少页</p>
<pre><code> from selenium import webdriver
#driver = webdriver.Firefox()
driver = webdriver.Chrome()
url = "https://advanced.name/freeproxy"
driver.get(url)
while True:
print(' - page -')
all_ips = driver.find_elements_by_xpath('//td[@data-ip]')
all_ports = driver.find_elements_by_xpath('//td[@data-port]')
for ip, port in zip(all_ips, all_ports):
print(ip.text, port.text)
try:
# go to next page
link_to_next_page = driver.find_element_by_link_text('»')
link_to_next_page.click()
except:
# exit loop if there is no more pages
break
</code></pre>
</li>
</ul>