Python splinter_model包_程序模块 - PyPI

从模型创建刮刀的碎片辅助对象

splinter_model的Python项目详细描述

使用废料选择器创建刮刀[构建
状态]（https://travis ci.org/rochacbruno/splinter_model.png）（https://travis ci.org/rochacbruno/splinter_model）

[！[PYPI版本]（https://pypip.in/v/splinter_model/badge.png）（https://pypi.python.org/pypi/splinter_model/）
[！[pypi下载]（https://pypip.in/d/splinter_model/badge.png）（https://pypi.python.org/pypi/splinter_model/）

splinter是使用python测试web应用程序的开源工具。它允许您自动执行浏览器操作，例如访问URL和与它们的项目交互。

http://splinter.cobrateam.info/

它是[scrapy_model]的克隆（http://github.com/rochacbruno/scrapy_model），但scrapy的interad使用splinter作为引擎，因此它允许对javascript网站进行scraping。

to do:

###需求

这个模块应该保持实现与scrapy_模型相同的api，至少支持cssfield、xpathfield、处理器、验证器、多个查询和parse_方法。

它还应该实现一个与javascript交互的层。

lpha dev

它只是使用scrapy选择器创建scraper的助手，允许您通过css或xpath选择元素，并通过模型（就像orm模型）构造scraper，并通过"populate"方法插入orm模型。

d或xpathfield（可以同时使用）

``python
from splinter_model import basefetchermodel，cssfield
````

``html
<；span id="person">；bruno rocha<；a href="http://brunorocha.org">；网站<；/a>；
``````
``python
``class myfetcher（basefetchermodel）：
name=cssfield（'span person'）
website=cssfield（'span person a'）
xpathfield（'//xpath_selector_here'）
```

>字段可以接收"auto_extract=true"参数，该参数在调用解析或处理器之前自动从选择器中提取值。此外，还可以传递"takes_first=true"，用于自动提取并尝试获取结果的第一个元素，因为scrapy选择器返回匹配元素的列表。

/>
``python
name=xpathfield（
['//*[@id="8"]/div[2]/div/div[2]/div[2]/ul'，
'//*[@id="8"]/div[2]/div/div[3]/div[2]/ul']
```

直到找到某个对象，否则它将返回一个空选择器。

.

例如，假设您获得上面定义的"name"字段，并希望验证每个查询，以确保其中包含文本"schblaums"的"li"。

```python

for li in selector.css（'li'）："获取ul sele中的每个<；li>；。ctor
li_text=li.css（'：：text'）.extract（）；如果li_text中有"schblaums"，则仅提取文本
；检查是否存在"schblaums"
返回true；此选择器有效！
返回false无效查询，取下一个或默认值

/div[2]/ul'，
'/*[@id="8"]/div[2]/div/div/div[3]/div[2]/ul'，
查询验证器=has懔schblaums，
默认值=未定义懔name懔"default"参数中定义的值。

>；**注意：**如果字段具有"default"，并且在所有匹配器中都失败，则默认值将传递给"processor"和"parse"方法。

r每个字段。

``python
def parse廑name（self，selector）：
e"person a"的scrapy选择器ached_fetch=true，用于缓存redis上的请求
fetcher.parse（）
`````

>；>>取数器。
{"name"："bruno rocha"，"website"："http://brunorocha.org"}
`````

>您可以填充一些对象

```python
>；>obj=myobject（）
>；>fetcher.populate（obj）字段可选

>；bruno rocha
```

如果不想在类中显式定义每个字段，可以使用json文件来自动执行该过程在这种情况下，file.json应该是

``json
{
{
{
{
"name"：{"css"，"span person"}，
"web网站"：{"css"："span pera"}
` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` `"xp `"可以使用` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` br/>

parse and processor

值，具体取决于自动提取并接受第一个参数。

ydatabase.state.search（name=state_name）.first（）

def text_cleanup（state_name）：
return state_name.strip（）.replace（'-'，''）.lower（）

class myfetcher（basefetchermodel）：
state=cssfield（
"state：：text"，
取得_first=true，
processor=[te清理清理，规范化状态

fetcher=myfetcher（url="http://////..……"
fetcher.parse（）

fetcher.fetcher.state
'sao-paulo'
fetcher.fetcher.data.state
<；orm instance-state-s-s-圣保罗圣保罗>；
```

<；orm instance-state-s-s-s-圣保罗>；
````>>

任何名为"parse"的方法t；field戋name>；``将在所有选择和分析过程之后运行，它将接收选择器或取决于auto戋extract的值，并在该字段中采用戋first参数。

示例：

`` python
def parse戋name（self，selector）：
返回selector.css（'：：text'）.extract（）[0].upper（）
```

可以使用"css"或"xpath"构建额外的查询，我们还需要从选择器中提取（）值并选择第一个元素并应用我们需要的任何转换。

默认情况下没有缓存，但您可以使用内置的rediscache传递

`` python
来自splinter_model import rediscache
fetcher=testfetcher（cache_fetch=true，
cache=rediscache，
cache-expire=1800）
```

或指定redis客户端的参数。

>；这是来自python``redis``模块的常规redis连接

``python
``fetcher=testfetcher（cache-fetch=true，
cache=rediscache（"192.168.0.12:9200"），
缓存过期=1800）
```

**kwargs）：
connection=connect_s3（access_key，secret_key）
self.bucket=connection.get_bucket（bucket_id）

def get（self，key）：
value=self.bucket.get_key（key）
return value.get_contents_as_string（），如果key-else

def set（self key，value，expire=none）：
self.bucket.set_contents（key，value，expire=expire）

cache_fetch=true，
cache=s3cache，
cache_expire=1800）

`````

您需要运行：

`` bash
sudo apt get install python scrapy
sudo apt get install libffi dev
sudo apt get install python dev
```

` `` bash
`` bash
pip install splinter-splinter-pu model
`````
` ` ` `
` ` ` ` ` bash
` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` ` u型
cd型
pipinstall-r requirements.txt
python setup.py install
python example.py
````

``python
xpathfield

photo-url=xpathfield（'//*[@id="content"]/div[1]/table/tr[2]/td/a'）

national=cssffield（
/content>；div:nth child（1）>；table>；tr:nth child（4）>；td>；a’，
>

links=cssffield（
br/>"内容">；div:nth child（11）>；ul>；li>；a.external：：attr（ref）'，
auto_extract=true
）

def parse_photo_url（self，selector）：
返回"http://en.m.wikipedia.org/{}"。格式（
selector.xpath（"@ref"）.extract（）[0]

def parse撸national（self，selector）：
返回selector.css（"：：text"）.extract（）[0]

返回selector.extract（）[0]

def pre廑parse（self，selector=none）：
此方法在分析之前执行，您可以重写它，查看文档字符串

def post廑parse（self）：
self._data.url=self.url

类dummymodel（对象）：
"
"仅用于测试，它可以是数据库中的模型orm
"

__=>"fetcher=testfetcher（cache-fetch=true）
fetcher.url="http://en.m.wikipedia.org/wiki/guido-van戋rossum"

her.mappings['name']={
"css"：（"section\u 0：：text"）
}

fetcher.parse（）

print"fetcher hold the data"
print fetcher.\u data.name
print fetcher.\u data

y=dummymodel（）

fetcher.populate（dummy，fields=["name"，"nationalization"]）
ttp://www.python.org/~guido/'，
u'http://neopythonic.blogspot.com/'，
u'http://www.artima.com/weblogs/index.jsp？blogger=guido'，
u'http://python history.blogspot.com/'，
u'http://www.python.org/doc/artists/cp4e.html'，
u'http://www.twit.tv/floss11'，
u'http://www.computerworld.com.au/index.php/id；66665771'，
u'http://www.stanford.edu/class/ee380/abstracts/081105.html'，
u'http://stanford online.stanford.edu/courses/ee380/081105-ee380-300.asx']，
"姓名"：u'guido van rossum'，
"国籍"：u'dutch'，
"照片网址"："http://en.m.wikipedia.org//wiki/file:guido van rossum'u oscon 2006.jpg'，
"网址"："http://en.m.wikipedia.org/wiki/guido_van_rossum'}
填充对象
荷兰语
{名称：u'guido van rossum'，国籍：u'dutch'}
```

欢迎加入QQ群-->： 979659372

splinter_model 0.1.6

splinter_model的Python项目详细描述

推荐PyPI第三方库

ASINMatcher

SQLAlchemy-Norm

pygenetic

shutdown

pyspedas

pygments-shader

run-timer

cottontail

des

MOPPY

v-palette

bytesinsert

drfr

scikit-build

html2json

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

splinter_model 0.1.6

splinter_model的Python项目详细描述

推荐PyPI第三方库

ASINMatcher

SQLAlchemy-Norm

pygenetic

shutdown

pyspedas

pygments-shader

run-timer

cottontail

des

MOPPY

v-palette

bytesinsert

drfr

scikit-build

html2json

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签