Python scraper包_程序模块 - PyPI

可配置的python web scraper

scraper的Python项目详细描述

简约的python dom scraper

说明

这个模块是一个易于使用的html/xml刮刀。它同时支持xpath和regular 表达式检索。

一旦有了要从中提取信息的文件，就可以提取用一个简单的函数调用获得多条信息。

你应该用你自己的方法来获取你想要的文件。

安装

pip install scraper

用法

使用xpath进行scrape：

import scraper
import requests

content = requests.get('https://github.com/explore').content

conf = {'trending-repos' : {'xpath' : '//ol/li/h3/a[2]/@href'}}

scraper.scrapes(content, conf)

>>> {'trending-repos': ['/jamescryer/grumble.js', '/dominictarr/JSON.sh', '/JamieLottering/DropKick', '/harvesthq/chosen', '/velvia/ScalaStorm']}

使用regexp进行刮擦：

import scraper
import requests

content = requests.get('http://wiki.nomasnumeros900.com/Air_Liquide').content

conf = {
        'numbers':
            {'regexp': '91[\s\d]+',
             'transf': [lambda x: x.strip()],
             'encoding': 'utf-8'}
        }

scraper.scrapes(content, conf)

>>> {'numbers': [u'915 029 300', u'915 029 560', u'915 029 330', u'91']}

欢迎加入QQ群-->： 979659372

scraper 0.1.0

scraper的Python项目详细描述

说明

安装

用法

使用xpath进行scrape：

使用regexp进行刮擦：

推荐PyPI第三方库

radicale-modoboa-token-auth

pyobjcframeworkaddressbook

dsets

magicka

intoto

wsgi-request-logger-bepro

PySeismoSoil

simple-calc-kw

stat-distrib

partial.p

dgllife

yadisk

djangocronman

mldn-messages

py-rpc

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

scraper 0.1.0

scraper的Python项目详细描述

说明

安装

用法

使用xpath进行scrape：

使用regexp进行刮擦：

推荐PyPI第三方库

radicale-modoboa-token-auth

pyobjcframeworkaddressbook

dsets

magicka

intoto

wsgi-request-logger-bepro

PySeismoSoil

simple-calc-kw

stat-distrib

partial.p

dgllife

yadisk

djangocronman

mldn-messages

py-rpc

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签