Python 从谷歌搜索结果页面获取所有链接
我想写一个脚本,能够返回在一个页面上找到的所有网址,比如谷歌的搜索结果。所以我写了这个脚本:(使用BeautifulSoup库)
import urllib2
from BeautifulSoup import BeautifulSoup
page = urllib2.urlopen("https://www.google.dz/search?q=see")
soup = BeautifulSoup(page.read())
links = soup.findAll("a")
for link in links:
print link["href"]
但是它返回了一个403禁止访问的错误:
Traceback (most recent call last):
File "C:\Python27\sql\sql.py", line 3, in <module>
page = urllib2.urlopen("https://www.google.dz/search?q=see")
File "C:\Python27\lib\urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 400, in open
response = meth(req, response)
File "C:\Python27\lib\urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python27\lib\urllib2.py", line 438, in error
return self._call_chain(*args)
File "C:\Python27\lib\urllib2.py", line 372, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 521, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden
有没有什么办法可以避免这个错误,或者有其他方法可以获取搜索结果中的网址?
3 个回答
0
在编程中,有时候我们会遇到一些问题,特别是在使用某些工具或库的时候。这些问题可能会让我们感到困惑,尤其是当我们刚开始学习编程的时候。比如,有人可能会在使用某个特定的功能时,发现它没有按照预期工作,这时候就需要去查找解决办法。
在这个过程中,很多人会选择在网上寻求帮助,比如在StackOverflow这样的论坛上提问。在提问时,清楚地描述问题和提供相关的代码片段是非常重要的,这样其他人才能更好地理解你的问题,并给出有效的建议。
总之,遇到问题时不要气馁,积极寻求帮助,并且尽量把问题描述清楚,这样才能更快找到解决方案。
import urllib.request
from BeautifulSoup import BeautifulSoup
page = urllib.request.urlopen("https://www.google.dz/search?q=see")
soup = BeautifulSoup(page.read())
links = soup.findAll("a")
for link in links:
print link["href"]
8
最好的办法是使用谷歌的API(你可以通过命令 pip install google
来安装)。GeeksforGeeks在这里有相关的介绍
from googlesearch import search
# to search
query = "see"
links = []
for j in search(query, tld="co.in", num=10, stop=10, pause=2):
links.append(j)
16
使用requests没有问题
import requests
from BeautifulSoup import BeautifulSoup
page = requests.get("https://www.google.dz/search?q=see")
soup = BeautifulSoup(page.content)
links = soup.findAll("a")
有些链接的格式像这样:search%:http://
,其中一个链接的结尾和另一个链接连接在一起,所以我们需要用正则表达式来把它们分开
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.google.dz/search?q=see")
soup = BeautifulSoup(page.content)
import re
links = soup.findAll("a")
for link in soup.find_all("a",href=re.compile("(?<=/url\?q=)(htt.*://.*)")):
print re.split(":(?=http)",link["href"].replace("/url?q=",""))
['https://www.see.asso.fr/&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CBIQFjAA&usg=AFQjCNF2_I8jB98JwR3jcKniLZekSrRO7Q']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:f7M8NX1XmDsJ', 'https://www.see.asso.fr/%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CBUQIDAA&usg=AFQjCNF8WJButjMNXQXvXBbtyXnF1SgiOg']
['https://www.see.asso.fr/3ei&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CBgQ0gIoADAA&usg=AFQjCNGnPL1RiX5TekI_yMUc-w_f2oVXtw']
['https://www.see.asso.fr/node/9587&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CBkQ0gIoATAA&usg=AFQjCNHX-6AzBgLQUF0s8TxFcZjIhxz_Hw']
['https://www.see.asso.fr/ree&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CBoQ0gIoAjAA&usg=AFQjCNGkkd8e1JjiNrhSM4HQYE-M6g6j-w']
['https://www.see.asso.fr/node/130&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CBsQ0gIoAzAA&usg=AFQjCNEkVdpcbXDz5-cV9u2NNYoV6aM8VA']
['http://www.wordreference.com/enfr/see&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CB0QFjAB&usg=AFQjCNHQGwcsGpro26dhxFP6q-fQvwbB0Q']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:ooK-I_HuCkwJ', 'http://www.wordreference.com/enfr/see%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CCAQIDAB&usg=AFQjCNFRlV5Zv_n48Wivr4LeOkTQsA0D1Q']
['http://fr.wikipedia.org/wiki/S%25C3%25A9e&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CCMQFjAC&usg=AFQjCNGmtqmcXPqYZ_nwa0RWL0uYf5PMJw']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:GjcgkyzsUigJ', 'http://fr.wikipedia.org/wiki/S%2525C3%2525A9e%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CCYQIDAC&usg=AFQjCNHesOIBU3OXBspARcONbK_k_8-gnw']
['http://fr.wikipedia.org/wiki/Camille_S%25C3%25A9e&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CCkQFjAD&usg=AFQjCNGO-WIDl4TrBeo88WY9QsopWmsMyQ']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:izhQjC85nOoJ', 'http://fr.wikipedia.org/wiki/Camille_S%2525C3%2525A9e%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CCwQIDAD&usg=AFQjCNEfcIKsKbf026xgWT7NkrAueZvL0A']
['http://de.wikipedia.org/wiki/Zugersee&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CDEQ9QEwBA&usg=AFQjCNHpfJW5-XdsgpFUSP-jEmHjXQUWHQ']
['http://commons.wikimedia.org/wiki/File:Champex_See.jpg&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CDMQ9QEwBQ&usg=AFQjCNEordFWr2QIaob45WlR5Yi-ZvZSiA']
['http://www.all-free-photos.com/show/showphotop.php%3Fidtop%3D4%26lang%3Dfr&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CDUQ9QEwBg&usg=AFQjCNEC24FOIE5cvF4zmEDgq5-5xubM3w']
['http://www.allbestwallpapers.com/travel-zell_am_see,_kaprun,_austria_wallpapers.html&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CDcQ9QEwBw&usg=AFQjCNFkzMZDuthZHvnF-JvyksNUqjt1dQ']
['http://www.see-swe.org/&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CDkQFjAI&usg=AFQjCNF1zbcLfjanxgCXtHoOQXOdMgh_AQ']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:lzh6JxvKUTIJ', 'http://www.see-swe.org/%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CDwQIDAI&usg=AFQjCNFYN6tzzVaHsAc5aOvYNql3Zy4m3A']
['http://fr.wiktionary.org/wiki/see&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CD8QFjAJ&usg=AFQjCNFWYIGc1gj0prytowzqI-0LDFRvZA']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:G9v8lXWRCyQJ', 'http://fr.wiktionary.org/wiki/see%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CEIQIDAJ&usg=AFQjCNENzi4E1n-9qHYsNahY6lQzaW5Xvg']
['http://en.wiktionary.org/wiki/see&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CEUQFjAK&usg=AFQjCNECGZjw-rBUALO43WaTh2yB9BUhDg']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:ywc4URuPdIQJ', 'http://en.wiktionary.org/wiki/see%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CEgQIDAK&usg=AFQjCNE0pykIqXXRl08E-uTtoj03QEpnbg']
['http://see-concept.com/&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CEsQFjAL&usg=AFQjCNGFWjhiH7dEBhITJt01ob_JENlz1Q']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:jHTkOVEoRsAJ', 'http://see-concept.com/%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CE4QIDAL&usg=AFQjCNECPgxt9ZSFmZzK_ker9Hw_FoCi_A']
['http://www.theconjugator.com/la/conjugaison/du/verbe/see.html&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CFEQFjAM&usg=AFQjCNETCTQ0vPDIdV_2Q57qq11dyN0d8Q']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:xD7_Qo7roS8J', 'http://www.theconjugator.com/la/conjugaison/du/verbe/see.html%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CFQQIDAM&usg=AFQjCNF_hBCyDZncivYGnL7je5kYme9hEg']
['http://www.zellamsee-kaprun.com/fr&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CFcQFjAN&usg=AFQjCNFVDeBWrZMDSjK9jKYF4AQlIXa9lA']
['http://webcache.googleusercontent.com/search%3Fq%3Dcache:BFBEUp05w7YJ', 'http://www.zellamsee-kaprun.com/fr%252Bsee%26hl%3Dfr%26%26ct%3Dclnk&sa=U&ei=ryv6U6PvEKzA7AaB4ICwCA&ved=0CFoQIDAN&usg=AFQjCNHtrOeEpYWqvT3f0M1p-gxUkYT1IA']