duckduckgo API不返回结果

3条回答

网友

1楼 · 编辑于 2024-04-24 05:55:49

在我已经得到了我的问题的答案，我接受并悬赏-我找到了一个不同的解决方案，我想在这里补充完整性。非常感谢所有帮助我达成这个解决方案的人。尽管这不是我所要求的解决方案，但它可能在未来对某人有所帮助。

在与此网站进行了长时间的艰苦对话并收到了一些支持邮件后发现：https://duck.co/topic/strange-problem-when-searching-intel-with-my-script

下面是解决方案代码（来自上面帖子中的答案）：

>>> import duckduckgo
>>> print duckduckgo.query('! Example').redirect.url
http://www.iana.org/domains/example

网友

2楼 · 编辑于 2024-04-24 05:55:49

尝试：

for result in r.results:
    print result.text

网友

3楼 · 编辑于 2024-04-24 05:55:49

如果您访问DuckDuck Go API Page，您将发现一些关于使用API的注释。第一个音符清楚地表明：

As this is a Zero-click Info API, most deep queries (non topic names) will be blank.

a以下是这些字段的列表：

Abstract: ""
AbstractText: ""
AbstractSource: ""
AbstractURL: ""
Image: ""
Heading: ""
Answer: ""
Redirect: ""
AnswerType: ""
Definition: ""
DefinitionSource: ""
DefinitionURL: ""
RelatedTopics: [ ]
Results: [ ]
Type: ""

所以这可能是一个遗憾，但是他们的API只是截短了一堆结果，并没有把它们给您；可能是为了更快地工作，似乎除了使用DuckDuckGo.com之外什么也做不了。

因此，显然，在这种情况下，API不是解决问题的方法。

至于我，我只看到了一条出路：从duckduckgo.com检索原始html并使用html5lib等解析它（值得一提的是，它们的html结构良好）。

值得一提的是，解析html页面并不是丢弃数据的最可靠的方法，因为html结构可以更改，而API通常保持稳定，直到公开宣布更改。

下面是如何使用BeautifulSoup实现这种解析的示例：

from BeautifulSoup import BeautifulSoup
import urllib
import re

site = urllib.urlopen('http://duckduckgo.com/?q=example')
data = site.read()

parsed = BeautifulSoup(data)
topics = parsed.findAll('div', {'id': 'zero_click_topics'})[0]
results = topics.findAll('div', {'class': re.compile('results_*')})

print results[0].text

此脚本打印：

u'Eixample, an inner suburb of Barcelona with distinctive architecture'

在主页面上直接查询的问题是，它使用JavaScript生成所需的结果（与主题无关），因此您只能使用HTML版本来获取结果。HTML版本有不同的链接：

http://duckduckgo.com/?q=exampleJavaScript版本
http://duckduckgo.com/html/?q=example#仅限HTML版本

让我们看看我们能得到什么：

site = urllib.urlopen('http://duckduckgo.com/html/?q=example')
data = site.read()
parsed = BeautifulSoup(data)

first_link = parsed.findAll('div', {'class': re.compile('links_main*')})[0].a['href']

存储在first_link变量中的结果是指向搜索引擎输出的第一个结果的链接（不是相关搜索）：

http://www.iana.org/domains/example

要获得所有链接，您可以在找到的标记上迭代（除了链接之外的其他数据可以以类似的方式接收）

for i in parsed.findAll('div', {'class': re.compile('links_main*')}):
    print i.a['href']

http://www.iana.org/domains/example
https://twitter.com/example
https://www.facebook.com/leadingbyexample
http://www.trythisforexample.com/
http://www.myspace.com/leadingbyexample?_escaped_fragment_=
https://www.youtube.com/watch?v=CLXt3yh2g0s
https://en.wikipedia.org/wiki/Example_(musician)
http://www.merriam-webster.com/dictionary/example
...

注意，仅HTML版本只包含结果，对于相关搜索，必须使用JavaScript版本。（在url中没有html部分）。

相关问题更多 >

编程相关推荐

热门问题

热门文章