使用 mechanize 进行 Bing 搜索返回空白页面

0 投票
2 回答
1395 浏览
提问于 2025-04-16 07:19

我正在使用mechanize这个工具来进行必应搜索,然后用beautiful soup来处理搜索结果。我之前用同样的方法成功进行了谷歌和雅虎的搜索,但当我进行必应搜索时,得到的却是一个空白页面。

我对此感到非常困惑,不知道为什么会这样。如果有人能帮我解释一下,我会非常感激。以下是我使用的代码示例:

from BeautifulSoup import BeautifulSoup
import mechanize
br = mechanize.Browser()
br.set_handle_robots(False)
br.open("http://www.bing.com/search?count=100&q=cheese")
content = br.response()
content = content.read()
soup = BeautifulSoup(content, convertEntities=BeautifulSoup.ALL_ENTITIES)
print soup

结果只打印出了一行空白。

2 个回答

0

另一种实现这个功能的方法是使用 requestsbeautifulsoup

这里有一个代码示例,可以在这个 在线编程环境 中查看:

from bs4 import BeautifulSoup
import requests, lxml, json

headers = {
    'User-agent':
    "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}


def get_organic_results():
  html = requests.get('https://www.bing.com/search?q=nfs', headers=headers)
  soup = BeautifulSoup(html.text, 'lxml')

  bing_data = []

  for result in soup.find_all('li', class_='b_algo'):
    title = result.h2.text
    try:
      link = result.h2.a['href']
    except:
      link = None
    displayed_link = result.find('div', class_='b_attribution').text
    try:
      snippet = result.find('div', class_='b_caption').p.text
    except:
      snippet = None

    for inline in soup.find_all('div', class_='b_factrow'):
      try:
        inline_title = inline.a.text
      except:
        inline_title = None
      try:
        inline_link = inline.a['href']
      except:
        inline_link = None

        bing_data.append({
        'title': title,
        'link': link,
        'displayed_link': displayed_link,
        'snippet': snippet,
        'inline': [{'title': inline_title, 'link': inline_link}]
      })

  print(json.dumps(bing_data, indent = 2))

# part of the created json output:
'''
[
  {
    "title": "Need for Speed Video Games - Official EA Site",
    "link": "https://www.ea.com/games/need-for-speed",
    "displayed_link": "https://www.ea.com/games/need-for-speed",
    "snippet": "Need for Speed Forums Buy Now All Games Forums Buy Now Learn More Buy Now Hit the gas and tear up the roads in this legendary action-driving series. Push your supercar to its limits and leave the competition in your rearview or shake off a full-scale police pursuit \u2013 it\u2019s all just a key-turn away.",
    "inline": [
      {
        "title": null,
        "link": null
      }
    ]
  }
]
'''

另外,你也可以使用 SerpApi 的 Bing 有机结果 API 来实现同样的功能。这个 API 是收费的,但提供 5000 次搜索的免费试用。

下面是集成的代码:

from serpapi import GoogleSearch
import os

def get_organic_results():
  params = {
    "api_key": os.getenv('API_KEY'),
    "engine": "bing",
    "q": "nfs most wanted"
  }

  search = GoogleSearch(params)
  results = search.get_dict()

  for result in results['organic_results']:
    title = result['title']
    link = result['link']
    displayed_link = result['displayed_link']
    try:
      snippet = result['snippet']
    except:
      snippet = None
    try:
      inline = result['sitelinks']['inline']
    except:
      inline = None
    print(f'{title}\n{link}\n{displayed_link}\n{snippet}\n{inline}\n')

# part of the output:
'''
Need for Speed: Most Wanted - Car Racing Game - Official ...
https://www.ea.com/games/need-for-speed/need-for-speed-most-wanted
https://www.ea.com/games/need-for-speed/need-for-speed-most-wanted
Jun 01, 2017 · To be Most Wanted, you’ll need to outrun the cops, outdrive your friends, and outsmart your rivals. With a relentless police force gunning to take you down, you’ll need to make split-second decisions. Use the open world to …
[{'title': 'Need for Speed No Limits', 'link': 'https://www.ea.com/games/need-for-speed/need-for-speed-no-limits'}, {'title': 'Buy Now', 'link': 'https://www.ea.com/games/need-for-speed/need-for-speed-heat/buy'}, {'title': 'Need for Speed Undercover', 'link': 'https://www.ea.com/games/need-for-speed/need-for-speed-undercover'}, {'title': 'Need for Speed The Run', 'link': 'https://www.ea.com/games/need-for-speed/need-for-speed-the-run'}, {'title': 'News', 'link': 'https://www.ea.com/games/need-for-speed/need-for-speed-payback/news'}]
'''

免责声明,我在 SerpApi 工作。

0

你可能收到的回复是,答案已经在你的浏览器缓存中了。试着稍微改一下你的查询字符串,比如把数量改成50。

你也可以加一些调试代码,看看服务器返回的头信息:

br.open("http://www.bing.com/search?count=50&q=cheese")
response = br.response()
headers = response.info()
print headers
content = response.read()

编辑:

我用Firefox和Opera浏览器尝试了这个查询,设置为count=100,结果发现bing似乎不喜欢这么“大”的数量。当我把数量减少时,它就能正常工作。所以这不是mechanize或其他Python库的问题,而是你的查询对bing来说有点问题。看起来浏览器可以用count=100查询bing,但必须先用一个更小的数量去查询。真奇怪!

撰写回答