零星分布
scrapy-eagle的Python项目详细描述
……图像:docs/image/logo==========================================================================图像:https://travis-ci.org/rafaelcapucho/scrapy-eagle.svg?Branch=Master
:Target:https http://travis-ci.org/rafaelcapucho/scrapy-eagle
。图像:https://img.shields.io/pypi/v/scrapy-eagle.svg
:target:https://pypi.python.org/pypi/scrapy-eagle
:alt:pypi version
。图像:https://img.shields.io/pypi/pyversions/scrapy-eagle.svg
:target:https://pypi.python.org/pypi/scrapy-eagle
。图像:https://landscape.io/github/rafaelcapucho/scrapy-eagle/master/landscape.svg?目标:https://landscape.io/github/rafaelcapucho/scrapy-eagle/master
:alt:code quality status
。图像:https://requires.io/github/rafaelcapucho/scrapy-eagle/requirements.svg?Branch=Master
:Target:https://requires.io/github/rafaelcapucho/scrapy-eagle/requirements/?Branch=Master
Alt:Requirements Status
Scrapy Eagle is a tool that allow us to run any scrapy \ \ ux based project in a distributed fashion and monitor how it is going on and how many resources it is consuming on each server.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>图像:https://badge.waffle.io/rafaelcapucho/scrapy-eagle.svg?标签=ready& title=ready
:目标:https://waffle.io/rafaelcapucho/scrapy-eagle
:alt: br/>br/>requestories in ready
br/>br/>br/>scrapy eagle uses redistributing
br/>。这是一个很好的例子。\ redis:http://mail.python.org/pipermail/doc-sig/
安装
-----------------------------
可通过运行代码Bellow容易做到
。代码:控制台
虚拟鹰Venv;CD Eagle pip install scrapy-eagle代码-block:控制台
host
host
host=127.0.0.1
port=6379
db=0
;password=someverrypass
debug cookie br/>cookie 港口=5000
[scrapy]
b项目=/Project=/Project=/Project=VENV/Project=UVENV/Project=USCRAPY/Project
BINARY=/Project=/Project=UVENV/BIN/PYTHON3
Base=/Project=/UVENV/SCRAPY
BR/>BR/>BR/>BR/>BR/>BR/>BR/>BR/>能像鹰一样执行服务器命令吗
代码-block:控制台
鹰服务器>配置文件=/etc/scrapy-eagle.ini
changes into your scrapy project
br/>br/>br/>br/>br/>changes into your scrapy project
br/-----------------------------------------------------------------页:1Code-Block:Python
>br/>350;enables scheduling storing requests quests quate in redis.
scheduler="scrapy=uu eagle.worker.scheduler.distributedscheduler"
br/>br/>br/>br/>35e ensure all spiders share same duplicates filter through class.
s调试滤波器Ter
35;Schedule requests using a priority couple.(Default)
scheduler=scheduler=uclass="scrapy \ eagle.worker.worker.quie.spiderpriorityquie.spiderpriorityquie"
br/>br/>br/>spideler=spiderpriorpiriorityquies>
scheduler>uier=uque="scra使用堆栈(LIFO)的程序请求Py ^ uu.worker.queuee.spiderstack
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>Max idle time to prevent在关闭前的时间表=0
35;specify the host and port to use when conn正在选择redis(可选)。
redis主机=本地主机
redis端口=6379
配置完成后,您应该调整每个蜘蛛以使用我们的mixin:
…代码块::python
from scrapy.spiders导入crawpsider,rule
from scrapy撸eagle.worker.spiders导入distributedmixin
class yoursider(distributedmixin,crawpsider):
name="domain.com"
domain.com/']
redis_key='domain.com:start_url'
rules=(
rule(…),
rule(…),
)
def_set_crawler(self,crawler):
crawspider.u set_crawler(self,crawler)
dist分布式mixin.setup_redis(self)
redis队列中的url将逐个处理。
然后,将url推送到redis::
redis cli lpush domain.com:start_url http://domain.com/
dashboard development
----
因为我们使用reactjs来构建接口,所以ed安装了npm_u。在本地安装所有依赖项:
…_ reactjs:https://facebook.github.io/react/
。_ NPM:https://www.npmjs.com/
…代码块:console
cd scrapy eagle/dashboard
npm install
然后可以运行"npm start"编译并开始监视任何更改并自动重新编译。
在ashboard上,您可以使用一个简单的http服务器,而不是运行"eagle"服务器,例如:
…代码块::console
sudo npm install-g http server
cd scrapy eagle/dashboard
http server templates/
http://127.0.0.1:8080
**注意**:到目前为止,scrapy eagle主要基于https://github.com/rolando/scrapy-redis。
:Target:https http://travis-ci.org/rafaelcapucho/scrapy-eagle
。图像:https://img.shields.io/pypi/v/scrapy-eagle.svg
:target:https://pypi.python.org/pypi/scrapy-eagle
:alt:pypi version
。图像:https://img.shields.io/pypi/pyversions/scrapy-eagle.svg
:target:https://pypi.python.org/pypi/scrapy-eagle
。图像:https://landscape.io/github/rafaelcapucho/scrapy-eagle/master/landscape.svg?目标:https://landscape.io/github/rafaelcapucho/scrapy-eagle/master
:alt:code quality status
。图像:https://requires.io/github/rafaelcapucho/scrapy-eagle/requirements.svg?Branch=Master
:Target:https://requires.io/github/rafaelcapucho/scrapy-eagle/requirements/?Branch=Master
Alt:Requirements Status
Scrapy Eagle is a tool that allow us to run any scrapy \ \ ux based project in a distributed fashion and monitor how it is going on and how many resources it is consuming on each server.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>图像:https://badge.waffle.io/rafaelcapucho/scrapy-eagle.svg?标签=ready& title=ready
:目标:https://waffle.io/rafaelcapucho/scrapy-eagle
:alt:
br/>br/>br/>scrapy eagle uses redistributing
br/>。这是一个很好的例子。\ redis:http://mail.python.org/pipermail/doc-sig/
安装
-----------------------------
可通过运行代码Bellow容易做到
。代码:控制台
虚拟鹰Venv;CD Eagle
host
host
host=127.0.0.1
port=6379
db=0
;password=someverrypass
[scrapy]
b项目=/Project=/Project=/Project=VENV/Project=UVENV/Project=USCRAPY/Project
BINARY=/Project=/Project=UVENV/BIN/PYTHON3
Base=/Project=/UVENV/SCRAPY
BR/>BR/>BR/>BR/>BR/>BR/>BR/>BR/>能像鹰一样执行服务器命令吗
代码-block:控制台
鹰服务器>配置文件=/etc/scrapy-eagle.ini
changes into your scrapy project
br/>br/>br/>br/>br/>changes into your scrapy project
br/-----------------------------------------------------------------页:1Code-Block:Python
>br/>350;enables scheduling storing requests quests quate in redis.
scheduler="scrapy=uu eagle.worker.scheduler.distributedscheduler"
br/>br/>br/>br/>35e ensure all spiders share same duplicates filter through class.
s调试滤波器Ter
35;Schedule requests using a priority couple.(Default)
scheduler=scheduler=uclass="scrapy \ eagle.worker.worker.quie.spiderpriorityquie.spiderpriorityquie"
br/>br/>br/>spideler=spiderpriorpiriorityquies>
scheduler>uier=uque="scra使用堆栈(LIFO)的程序请求Py ^ uu.worker.queuee.spiderstack
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>Max idle time to prevent在关闭前的时间表=0
35;specify the host and port to use when conn正在选择redis(可选)。
redis主机=本地主机
redis端口=6379
配置完成后,您应该调整每个蜘蛛以使用我们的mixin:
…代码块::python
from scrapy.spiders导入crawpsider,rule
from scrapy撸eagle.worker.spiders导入distributedmixin
class yoursider(distributedmixin,crawpsider):
name="domain.com"
domain.com/']
redis_key='domain.com:start_url'
rules=(
rule(…),
rule(…),
)
def_set_crawler(self,crawler):
crawspider.u set_crawler(self,crawler)
dist分布式mixin.setup_redis(self)
redis队列中的url将逐个处理。
然后,将url推送到redis::
redis cli lpush domain.com:start_url http://domain.com/
dashboard development
----
因为我们使用reactjs来构建接口,所以ed安装了npm_u。在本地安装所有依赖项:
…_ reactjs:https://facebook.github.io/react/
。_ NPM:https://www.npmjs.com/
…代码块:console
cd scrapy eagle/dashboard
npm install
然后可以运行"npm start"编译并开始监视任何更改并自动重新编译。
在ashboard上,您可以使用一个简单的http服务器,而不是运行"eagle"服务器,例如:
…代码块::console
sudo npm install-g http server
cd scrapy eagle/dashboard
http server templates/
http://127.0.0.1:8080
**注意**:到目前为止,scrapy eagle主要基于https://github.com/rolando/scrapy-redis。