Python simple-site-crawler包_程序模块 - PyPI

简单的网站爬虫，异步抓取一个网站和它能找到的所有信息，以及他们所依赖的静态内容。

simple-site-crawler的Python项目详细描述

简单的网站爬虫程序，异步爬网一个网站和所有它可以找到的子页面，以及它们所依赖的静态内容。您可以将它用作python项目中的库，也可以检查提供的cli当前可以向您显示已爬网的数据（链接、图像、css和javascript文件）为每个找到的站点创建一个sitemap.xml文件。

主要用于播放asyncio、aiohttp和新的 async/await语法，所以：

它需要Python3.5或更高版本
目前还没有计划推出新功能；请随时提出建议不过，如果有人真的使用它们；-）

完全公开-我发现项目进行到一半 this 文章（和代码）几乎完全符合我的要求由BDFL亲自撰写。哦，好吧。我还是完成了这个项目并没有明显地复制任何东西，但它确实影响了我的一些选择。毕竟，如果这对语言的创造者来说足够好的话我在用，可能对我来说已经足够了。

安装

>；来自PYPI:

$ pip3 install simple-site-crawler

使用Git克隆：

$ git clone https://github.com/pawelad/simple-site-crawler
$ pip3 install -r simple-site-crawler/requirements.txt
$ cd simple-site-crawler/bin

用法

$ simple-site-crawler --help
Usage: simple-site-crawler [OPTIONS] URL

  Simple website crawler that generates its sitemap and can either print it
  (and its static content) or export it to standard XML format.

  See https://github.com/pawelad/simple-site-crawler for more info.

Options:
  -t, --max-tasks INTEGER  Maximum allowed number of async tasks.
  -e, --export-to-xml      Export sitemap to XML file.
  -s, --suppress           Suppress printing output to stdout.
  --help                   Show this message and exit.

API

到目前为止还没有合适的文档，但是代码被注释了 应该很容易使用。

也就是说-你可以随便问我 email或GitHub issues如果什么都不清楚。

测试

在python上使用py.test和tox对包进行了测试 3.5和3.6（见tox.ini）。

代码覆盖率可在 Coveralls。

要自己运行测试，您需要在存储库中运行tox：

$ pip install -r requirements/dev.txt
$ tox

贡献

包源代码位于 GitHub。

请随意使用、询问、叉、星、报告错误、修复错误、建议增强功能，添加功能并指出任何错误。谢谢！

作者

由Paweł Adamczak开发和维护。

在MIT License下发布。

欢迎加入QQ群-->： 979659372

推荐PyPI第三方库

导航栏
项目描述
版本历史
下载文件
项目链接
首页
标签
许可证: BSD许可证（BSD 3条款）
作者信息:: 暂无
维护者
pawel.ad
最新PyPI项目
italian_vip_says
UFx
vofs
fake_item_generator
NerEva
django-monologue
fio_product_attribute_strict
climailsystem
pyshape
tbb-devel
npy-append-arra
anthill.tal.macrorenderer
odoo11-addon-stock-a
uuuu
contextil
fyl_nester
appomatic_renderable
teacher
chuletas
slackbot_ce
最新Python常见问题
无法使用Django restfram生成PDF
无法使用Django Rest框架发送压缩的gzip数据
无法使用Django rest框架进行身份验证(请求用户=匿名用户）
无法使用Django、Python和JavaScrip触发onclick函数
无法使用Django.views.generic.View保存表单
无法使用Django（python 2.7，OS X 10.11.1）
无法使用Django/mongoengine连接到MongoDB（身份验证失败）
无法使用Django\u mssql\u后端迁移到外部hos
无法使用Django&Python3.4连接到MySql
无法使用Django+nginx上载媒体文件
无法使用Django1.6导入名称模式
无法使用Django1.7和mongodb登录管理站点
无法使用Djangoadmin创建项目，进程使用了错误的路径，因为我事先安装了错误的Python
无法使用Djangockedi验证CBV中的字段
无法使用Djangocketditor上载图像（错误400）

simple-site-crawler 0.1.1

simple-site-crawler的Python项目详细描述

安装

用法

API

测试

贡献

作者

推荐PyPI第三方库

five.globalrequest

dynamic-yaml

resume

tgnooapi

chamd

gecosistema-krige

moire

ppmongo

simular

nbviewer

mpt-multiplot

pycopy-uaiohttpclient

coexecutor

altpt

requests-middleware

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

simple-site-crawler 0.1.1

simple-site-crawler的Python项目详细描述

安装

用法

API

测试

贡献

作者

推荐PyPI第三方库

five.globalrequest

dynamic-yaml

resume

tgnooapi

chamd

gecosistema-krige

moire

ppmongo

simular

nbviewer

mpt-multiplot

pycopy-uaiohttpclient

coexecutor

altpt

requests-middleware

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签