如何使用python构建个性化的Arxiv文章提要?

2024-04-29 07:17:22 发布

您现在位置:Python中文网/ 问答频道 /正文

Arxiv已经成长为一个非常有用的图书馆。然而,阅读新文章的数量花了相当长的时间,预计数量还会增加。因此,我想用python构建一个个性化的Arxiv提要,那里有一个arXiv API User's Manual的网站

它有两个主要的搜索语法,search\u query和id\u list,第5节中提到了更多的前缀。i、 e.看起来是这样的:

http://export.arxiv.org/api/query?search_query=all:electron&start=0&max_results=10 

问题1:

然而,它确实混淆了查询和搜索查询之间的区别。因为在5.1. Details of Query Constructionid_列表中混合了前缀,并且整个网页没有包含如何同时使用搜索查询和id_列表的示例,即

url="http://export.arxiv.org/api/query?search_query=all:electron+And+id_list=cs/9901002v1"
url="http://export.arxiv.org/api/query?search_query=all:electron+And+query?id_list=cs/9901002v1"

并没有返回空的但仍然是电子的物品

为什么会这样?如何同时使用id\u列表和search\u查询?例如,如何限制“计算机科学”中包含“电子”搜索查询词的文章?查询和查询有什么区别?搜索和查询

问题2:

使用python 3 syntax的结果也返回了乱码

url="http://export.arxiv.org/api/query?search_query=all:electron"
data=urlopen(url).read();
print(data)

返回(行被拆分以便可以读取)

b'<?xml version="1.0" encoding="UTF-8"?>\n<feed xmlns="http://www.w3.org/2005/Atom">\n  <link 
href="http://arxiv.org/api/query?search_query%3Dall%3Aelectron%26id_list
%3D%26start%3D0%26max_results%3D10" rel="self" type="application/atom+xml"/>\n  <title type="html">ArXiv Query: search_query=all:electron&amp;id_list=&amp;start=0&amp;max_results=10</title>\n  
<id>http://arxiv.org/api/WyBPOs+pRgzCTXTMWhtnbcOmk6g</id>\n  
<updated>2021-02-27T00:00:00-05:00</updated>\n  <opensearch:totalResults xmlns:opensearch="http://a9.com
/-/spec/opensearch/1.1/">168093</opensearch:totalResults>\n  <opensearch:startIndex 
xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/">0</opensearch:startIndex>\n  
<opensearch:itemsPerPage xmlns:opensearch="http://a9.com/-/spec/opensearch
/1.1/">10</opensearch:itemsPerPage>\n  <entry>\n    <id>http://arxiv.org/abs/cond-mat/0102536v1</id>\n  
  <updated>2001-02-28T20:12:09Z</updated>\n    <published>2001-02-28T20:12:09Z</published>\n    
<title>Impact of Electron-Electron Cusp on Configuration Interaction Energies</title>\n    <summary> 
 The effect of the electron-electron cusp on the convergence of configuration\ninteraction (CI) wave 
functions is examined. By analogy with the\npseudopotential approach for electron-ion interactions, an 
effective\nelectron-electron interaction is developed which closely reproduces the\nscattering of the Coulomb interaction but is smooth and finite at zero\nelectron-electron separation. The exact many-electron wave function for this\nsmooth effective interaction has no cusp at zero electron-electron separation.\nWe perform CI and quantum Monte Carlo calculations for He and Be atoms, both\nwith the Coulomb electron-electron interaction and with the smooth effective\nelectron-electron interaction. We find that convergence of the CI expansion of\nthe wave function for the smooth electron-electron interaction is not\nsignificantly improved compared with that for the divergent Coulomb interaction\nfor energy differences on the order of 1 mHartree. This shows that, contrary to\npopular belief, description of the electron-electron cusp is not a limiting\nfactor, to within chemical accuracy, for CI 
calculations.\n</summary>\n 

...

首先,为什么返回的数据是字节而不是字符串?搜索字节是否更好?数据实际上包含什么以及如何使用它们?例如,“b;…feed xmlns=“http://www.w3.org/2005/Atom">\n"与Arxiv API有什么关系?“b”代表什么

问题: 如何使用python构建个性化的Arxiv文章提要?i、 e针对特定主题分类的脚本搜索是否包含特定作者或关键字,并正确打印类型字节数据中的“标题”和“摘要”

我以前很少使用html语言,而且还不熟悉使用python访问带有API的网站。你能帮我在步骤中填写一些基本概念吗?这样我就可以做更多的谷歌搜索了


Tags: oftheorgapiidhttpforsearch