分布式脚本爬虫框架。
vredis.p的Python项目详细描述
分布式脚本爬虫框架。
# Simply connecting redis on the worker side of the library provides an-# executable power for distributed scripts# All function executions on the sender end will be piped to redis,# and the worker end will be pulled out of the pipe to execute.# Support multi-task simultaneous execution! Each execution maintains a taskid,# and different tasks maintain their configuration space according to the-# taskid when they are executed simultaneously.
工人
# if in start_worker.pyimportvrediss=vredis.Worker.from_settings(host='xx.xx.xx.xx',port=6666,password='vredis')s.start()# if in bashC:\Users\Administrator>vredisworker-hoxx.xx.xx.xx-po6666-pavredis-db0# if not set param. use defaults param.# default host localhost# default port 6379# default password None# default db 0
发件人
fromvredisimportpipepipe.connect(host='xx.xx.xx.xx',port=6666,password='vredis')pipe.DEBUG=True# True/False. worker prints on the worker_console.# very low code intrusion, no decorator or even complete barrier-free execution# The decorated function becomes a send function and is sent to the task pipeline@pipedefsome(i):importtime,randomrd=random.randint(1,2)time.sleep(rd)print('use func:{}, rd time:{}'.format(i,rd))return123# return a data and wraps them in JSON data and passes them in redis.@pipe.table('mytable')# if not set table, use "default" as tablenamedefsome2(i):print('use func2:{}'.format(i))return333,444# if return is a generator or list or tuple,# First, he iterates out the parameters and wraps them in JSON data and passes them in.# data collection space use tablename <= default tablename space "default".foriinrange(100):some(i)# first send task it will get a taskid. info will log out.some2(i)
获取数据
fromvredisimportpipepipe.connect(host='xx.xx.xx.xx',port=6666,password='vredis')foriinpipe.from_table(taskid=26):print(i)# the second param is tablename. default tablename is "default"