Python scrapekit包_程序模块 - PyPI

轻量级刮网工具

scrapekit的Python项目详细描述

#scrapekit

你知道整个网络都是由数据组成的吗？可能是的。
scrapekit可以帮助您使用简单的python脚本获取数据。基于
[请求]（http://docs.python requests.org/），库将处理
缓存，线程和日志记录。

参阅[完整的文档]（http://scrapkit.readthedocs.org/）。

<
例如

``python
>对于ROW在doc.findall（'.///tr'）：
yieloperow

@scraper.task
def get_row（row）：
columns=row.findall（'./td'）
princolumn

pipeline=get_index get_row

```

ScrapeKit配合使用，并不是为了提供所有必要的功能用于
刮伤。具体来说，它不涉及html解析、数据存储和数据验证。对于这些需要，请检查以下库：

*[lxml]（http://lxml.de/）以进行html/xml解析；更快（比http://www. CRUMY.COM/Studio/BeautifulSoup）更灵活、更快速、更具灵活性。[BR//**[DataSet，RTFD.ORG ]是一个简化的存储在SQL数据库中的半结构化数据的姊妹库。
BR/>现有工具
BR/>＊[Tracy]（http://clier.Org/）是一个成熟得多的工具。开发铲运机的全面框架。另一方面，它要求您在它的类系统中开发scraper。对于一个简单的脚本来说，这可能太重了，无法从网站上抓取数据。
*[scrapelib]（http://scrapelib.readthedocs.org/）是一个很薄的包装器，可以对请求进行限制、重试和缓存。
*[mechanicalsoup]（https://github.com/hickford/mechanicalsoup）将
美化组和请求绑定到命令中，有状态的API。

它是通过
[icfj]（http://icfj.org）、[ancir]（http://investigativecenters.org）和
[icij]（http://icij.org）的项目开发的。

欢迎加入QQ群-->： 979659372

scrapekit 0.2.1

scrapekit的Python项目详细描述

推荐PyPI第三方库

pytest-resource

copytruncate

replace_me

jgrepl

newdoc

django-yabackup

signalfx-serverless-gcf

setuptools-markdown

coffeecam

genshicolumntemplate

django-user-messages

configureme

condorp

Finger-balabolka

odoo10-addon-account-invoice-triple-discount

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

scrapekit 0.2.1

scrapekit的Python项目详细描述

推荐PyPI第三方库

pytest-resource

copytruncate

replace_me

jgrepl

newdoc

django-yabackup

signalfx-serverless-gcf

setuptools-markdown

coffeecam

genshicolumntemplate

django-user-messages

configureme

condorp

Finger-balabolka

odoo10-addon-account-invoice-triple-discount

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签