Python spyde包_程序模块 - PyPI

具有可插入递归策略的简单web蜘蛛

spyde的Python项目详细描述

Spydey

一个简单的网络蜘蛛有几个递归策略。主页位于http://github.com/slinkp/spydey。

除了跟踪链接和报告状态外，它没有什么作用。我主要用于快速脏烟测试和链路检查。

唯一不寻常的特性是--traversal=pattern选项，它递归遍历是否以不寻常的顺序进行：它试图识别在url中的模式，并将在它以前见过的图案。当没有新的模式接下来，它将随机链接到已知模式的url。如果你使用这是一个典型的现代web应用程序的冒烟测试，它映射了url 模式到视图/控制器，这将很快击中所有视图/控制器至少一次…通常。但不是很好当指向一个有任意深度树的网站时很有趣（静态文件、VCS存储库等）。

另外，它的设计使得添加一个新的递归策略琐碎的。Spydey最初是为了尝试不同的递归爬行策略。阅读来源。

哦，如果你安装的很棒，控制台输出是彩色的。

对于惰性的、零配置的冒烟测试，我通常运行如下：

spydey -r --stop-on-error --max-requests=200 --traversal=pattern --profile --log-referrer URL

还有很多其他的命令行选项，很多是从工作组。使用--help查看它们是什么。

用法

Usage: spydey [options] URL

Options:
  -h, --help            show this help message and exit
  -r, --recursive       Recur into subdirectories
  -p, --page-requisites
                        Get all images, etc. needed to display HTML page.
  --no-parent           Don't ascend to the parent directory.
  -R REJECT, --reject=REJECT
                        Regex for filenames to reject. May be given multiple
                        times.
  -A ACCEPT, --accept=ACCEPT
                        Regex for filenames to accept. May be given multiple
                        times.
  -t TRAVERSAL, --traversal=TRAVERSAL, --traverse=TRAVERSAL
                        Recursive traversal strategy. Choices are: breadth-
                        first, depth-first, hybrid, pattern, random
  -H, --span-hosts      Go to foreign hosts when recursive.
  -w WAIT, --wait=WAIT  Wait SECONDS between retrievals.
  --random-wait=RANDOM_WAIT
                        Wait from 0...2*WAIT secs between retrievals.
  --loglevel=LOGLEVEL   Log level.
  --log-referrer, --log-referer
                        Log referrer URL for each request.
  --transient-log       Use Fabulous transient logging config.
  --max-redirect=MAX_REDIRECT
                        Maximum number of redirections to follow for a
                        resource.
  --max-requests=MAX_REQUESTS
                        Maximum number of requests to make before exiting. (-1
                        used with --traversal=pattern means exit when out of
                        new patterns)
  --stop-on-error       Stop after the first HTTP error (response code 400 or
                        greater).
  -T TIMEOUT, --timeout=TIMEOUT
                        Set the network timeout in seconds. 0 means no
                        timeout.
  -P, --profile         Print the time to download each resource, and a
                        summary of the 20 slowest at the end.
  --stats               Print a summary of traversal patterns, if
                        --traversal=pattern
  -v, --version         Print version information and exit.

更改日志

0.5

删除无用的模式统计信息，除非–统计信息已给出
修复以防止在执行重定向时跨越主机，除非-h处于打开状态。

0.4

添加--stop-on-error选项
添加--max-requests=-1表示在看到所有模式后停止（与–traversal=pattern一起使用时）
将用法文本自动添加到pkg信息中

0.3

更好的重定向处理：遵从-a、-r、-max redirect和-max requests选项
小错误修复和重构

欢迎加入QQ群-->： 979659372

spydey 0.5

spyde的Python项目详细描述

Spydey

用法

更改日志

0.5

0.4

0.3

推荐PyPI第三方库

emailrep

obspyh5

yerkes

unicode-mayo

fpkem

drudge

python-swiftclient

aiotext

django-digest

xbrl_parser

odoo10-addon-web-widget-slickroom

Trello2Kanboard

twitter_auto

prophet

lshknn

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

spydey 0.5

spyde的Python项目详细描述

Spydey

用法

更改日志

0.5

0.4

0.3

推荐PyPI第三方库

emailrep

obspyh5

yerkes

unicode-mayo

fpkem

drudge

python-swiftclient

aiotext

django-digest

xbrl_parser

odoo10-addon-web-widget-slickroom

Trello2Kanboard

twitter_auto

prophet

lshknn

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签