Python feed-seeker包_程序模块 - PyPI

从网页中提取rss、atom和其他提要

feed-seeker的Python项目详细描述

it斜押韵“热探索者”

用于从网页中查找Atom、RSS、RDF和XML源的库。在mediacloud项目中制作。对feedfinder2的一个渐进改进，它本身是基于feedfinder，由马克·皮尔格林编写，亚伦·斯沃茨维护，直到他早逝。

安装

库位于PyPI：

pip install feed_seeker

快速启动

默认情况下，库使用requests来获取html并检查它，找出可能的源URL:

fromfeed_seekerimportfind_feed_url>>>find_feed_url('https://github.com/mitmedialab/feed_seeker')'https://github.com/mitmedialab/feed_seeker/commits/master.atom'

要进行更彻底的搜索，请使用generate_feed_urls，它首先返回更可能的候选项。

fromfeed_seekerimportgenerate_feed_urls>>>forurlingenerate_feed_urls('https://xkcd.com'):...print(url)...https://xkcd.com/atom.xmlhttps://xkcd.com/rss.xml

要进行最彻底的搜索，请添加一个spider参数，对同一主机名上的url进行深度优先排序。注意，下面的调用需要将近4分钟，而find_feed_url则需要0.5秒。

>>>forurlingenerate_feed_urls('https://github.com/mitmedialab/feed_seeker',spider=1):...print(url)...https://github.com/mitmedialab/feed_seeker/commits/master.atom,https://github.com/mitmedialab/feed_seeker/commits/95cf320796c487df8b70f9c42281d8f26452cc31.atom,https://github.com/mitmedialab/feed_seeker/commits/3e93490cb91f7652325c2fe41ef29a5be4558d6a.atom,https://github.com/mitmedialab/feed_seeker/commits/659311b8853c4c4a67e3b4bc67a78461d825a064.atom,https://github.com/mitmedialab/feed_seeker/commits/a8f7b86eac2cedd9209ac5d2ddcceb293d2404c9.atom,https://github.com/index.atom,https://github.com/articles.atom,https://github.com/dfm/feedfinder2/commits/master.atom,https://github.com/blog.atom,https://github.com/blog/all.atom,https://github.com/blog/broadcasts.atom,https://github.com/ColCarroll.atom

急急忙忙？

如果您有一个很长的url列表，您可能需要用max_time：

设置超时

>>>forurlin('https://httpstat.us/200?sleep=5000','https://github.com/mitmedialab/feed_seeker'):...try:...print('found feed:\t{}'.format(find_feed_url(url,max_time=3)))...exceptTimeoutError:...print('skipping {}'.format(url))skippinghttps://httpstat.us/200?sleep=5000foundfeed:https://github.com/mitmedialab/feed_seeker/commits/master.atom

与`feedfinder2`

的差异

最大的区别是，所有函数都是作为生成器实现的，并且被惰性地评估。候选feed链接实际上是被访问和检查的，以确定它们是否是feed，这可能非常耗时。我们公开一个函数以查找最可能的feed链接，而另一个函数则以粗略的顺序从最突出到最少地生成链接。

还有一些基于我们在mediacloud的经验的启发式方法。

欢迎加入QQ群-->： 979659372

feed-seeker 1.0.0

feed-seeker的Python项目详细描述

安装

快速启动

急急忙忙？

与`feedfinder2`

推荐PyPI第三方库

pyramidunicodedammit

biggan

uwsgitasks

pyobjcframeworkbusinesschat

suruomonester

testpypijuhee

hanabi-learning-environment

pyconfighelper

ecuth

topsis-101703384

platon-account

dabeplech

djangouseraccounts

evemarkettools

portmanp

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

feed-seeker 1.0.0

feed-seeker的Python项目详细描述

安装

快速启动

急急忙忙？

与feedfinder2

推荐PyPI第三方库

pyramidunicodedammit

biggan

uwsgitasks

pyobjcframeworkbusinesschat

suruomonester

testpypijuhee

hanabi-learning-environment

pyconfighelper

ecuth

topsis-101703384

platon-account

dabeplech

djangouseraccounts

evemarkettools

portmanp

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

与`feedfinder2`

导航栏

项目链接

标签