用于与谷歌搜索设备通信的客户端。
ubuntudesign.gsa的Python项目详细描述
用于Google Search Appliance的客户端库,使检索python中的搜索结果更容易。
安装
这个模块在pypi中是ubuntudesign.gsa
。您只需使用以下命令即可安装它:
pip install ubuntudesign.gsa
gsaclient
这是一个查询google搜索设备的基本客户端。
查询
您可以使用search
方法查询gsa。
search_client=GSAClient(base_url="http://gsa.example.com/search")first_ten_results=search_client.search("hello world")first_thirty_results=search_client.search("hello world",num=30)results_twenty_to_forty=search_client.search("hello world",start=20,num=20)
这将设置q, start(默认值:0)和 num(默认值:10)和 lr(默认值:'')参数。 没有其他的search parameters, 会被提供,所以他们都会回到违约状态。
返回的results对象将尝试映射每个gsa standard result XML tags 以更可读的格式:
{'estimated_total_results':int,# "M": GSA's estimate, see below'document_filtering':bool,# "FI": Is filtering enabled?'next_url':str,# "NU": GSA URL for querying the next set of results, if available'previous_url':str,# "PU": Ditto for previous set of results'items':[{'index':int,# "R[N]": The number of this result in the index of all results'url':str,# "U": The URL of the resulting page'encoded_url':str,# "UE": The above URL, encoded'title':str,# "T": The page title'relevancy':int,# "RK": How relevant is this result to the query? From 0 to 10'appliance_id':str,# "ENT_SOURCE": The serial number of the GSA'summary':str,# "S": Summary text for this result'language':str,# "LANG": The language of the page'details':{}# "FS": Name:value pairs of any extra info'link_supported':bool,# "L": “link:” special query term is supported,'cache':{# "C": Dictionary, or "None" if cache is not available'size':str,# "C[SZ]": Human readable size of cached page'cache_id':str,# "C[CID]": ID of document in GSA's cache'encoding':str# "C[ENC]": The text encoding of the cached page}},...]}
按域或语言过滤
您可以通过指定特定域或 specific language。
english_results=search_client.search("hello world",language="lang_en")non_english_results=search_client.search("hello world",language="-lang_en")domain_specific_results=search_client.search("hello world",domains=["site1.example.com","site2.example.com"])
nb:如果未找到具有指定language的搜索结果,gsa将返回以所有语言找到的任何结果。
得到准确的总数
在撰写本文时,google搜索设备将返回 每个查询的结果总数,但是这个估计通常是 不准确,有时超过10倍!即使是这样 rc 启用。
使用total_results
方法,客户端将尝试请求结果
990-1000。这通常会导致gsa返回
结果,这允许我们找到实际的结果总数。
total=search_client.total_results("hello world",domains=[],language='')
django视图
为了简化带有django的gsa客户机的使用,包含了django视图 用这个模块。
使用量
至少,需要提供SEARCH_SERVER_URL
设置来告知视图
GSA的位置:
# settings.pySEARCH_SERVER_URL='http://gsa.example.com/search'# Required: GSA locationSEARCH_DOMAINS=['site1.example.com']# Optional: By default, limit results to this set of domainsSEARCH_LANGUAGE='lang_zh-CN'# Optional: By default, limit results to this language# urls.pyfromubuntudesign.gsa.viewsimportSearchViewurlpatterns+=[url(r'^search/?$',SearchView.as_view(template_name="search.html"))]
然后将可以查询此视图:
example.com/search?q=my+search+term
example.com/search?q=my+search+term&domain=example.com&domain=something.example.com
(覆盖SEARCH_DOMAINS
)example.com/search?q=my+search+term&language=-lang_zh-CN
(排除中文结果,重写SEARCH_LANGUAGE
)
检索搜索结果后,视图将把上下文对象传递给指定的template_name
(在本例中为search.html
)。
上下文对象的结构如下:
{'query':str,# The value of the `q` parameters passed to the view'limit':int,# The value of the `limit` parameter, or the default of 10'offset':int,# The value of the `offset` parameter, or the default of 0'error':None|str,# None, or a description of the error if one occurred'results':{'items':[],# The list of items as returned from the GSAClient (see above)'total':int,# The exact total number of results available'start':int,# The index of the first result in the set'end':int,# The index of the last result in the set'next_offset':int|None,# The offset for the next page of results, if available'previous_offset':int|None,# The offset for the previous page of results, if available'last_page_offset':int,# The offset for the last page of results'last_page':int,# The final page number (calculated from "limit" and "total")'current_page':int,# The current page number (calculated from "limit" and "end")'penultimate_page':int# The second-to-last page}