mapreduce查询redis的python实现
rapmedusa的Python项目详细描述
#拉美杜莎
rapmedusa是一个python模块,通过 redis键值存储。它旨在提供类似的功能 (在某些方面)couchdb的视图特性和mongodb的mapreduce数据库 命令。
##依赖关系
rapmedusa依赖于andy mccurdy的redis py模块,它可以是 从https://github.com/andymccurdy/redis-py][1获得。当然, 您还需要一个正在运行的[redis][2]实例来连接。在这两种情况下, 任何版本>;=2.0都应与Rapmedusa兼容。
##安装
$ sudo pip install rapmedusa
或
$ sudo easy_install rapmedusa
或来源:
$ sudo python setup.py install
##概述
首先,导入所需的模块:
- :::python
>>> import redis >>> from rapmedusa import emit, map_reduce
接下来,以通常的方式连接到正在运行的redis实例:
- :::python
>>> redis = redis.StrictRedis(host='localhost', port=6379, db=0)
最后,必须提供map和reduce函数的实现,并将其与redis服务器的活动连接一起传递到对map\reduce()函数的调用中:
- :::python
>>> def myMap(key, val): ... emit(newKey, newVal)>>> def myReduce(key, values): ... return newVal>>> result = map_reduce(redis, myMap, myReduce)
这将返回一个python dictionary对象,其中包含运行mapreduce作业的结果。字典中的每个键对应于传入reduce函数的键,并包含reduce函数为该键计算的值。
##详细信息
现在是时候更深入地了解Rapmedusa是如何执行MapReduce任务的。基本上有6个步骤:
- Read the input data set from a specified Redis hash.
- Pass each key/value pair from the input data set to the registered map function.
- Organize key/value pairs emitted by the map function into a set of Redis lists, one list per distinct emitted key.
- Each of these lists is passed to the registered reduce function, along with the corresponding key.
- The result of each call to reduce is stored in the Redis hash reserved for the job output, under the key used in the reduce call.
- A Python dictionary representing the contents of the job output hash is returned.
在这一点上,一个自然的问题是如何指定输入和输出散列键?这些(以及在 以上步骤)可以在对map_reduce()的调用中指定。下面是附加的可选参数列表,可以是 在调用中指定:
- inKey – specifies the key under which the input data set is stored (defaults to ‘rapmedusa:inputs’)
- outKey – specifies the key under which the job output is stored (defaults to ‘rapmedusa:outputs’)
- sortKey – specifies the key prefix under which the output of the map function (step 3 above) is stored (defaults to ‘rapmedusa:sortedVals’)
- sortedKeySet – specifies the key under which the set formed from the list keys of step 3 is stored (defaults to ‘rapmedusa:sortedKeySet’)
- cleanUp – a boolean value indicating whether the temporary keys (sortKey, sortedKeySet) should be deleted from the Redis store upon the completion of the MapReduce job (defaults to True)
您很少需要重写sortedkey和sortedkey集的默认值,因为命名冲突极不可能发生。但是,您可能希望为更容易记住的inkey和outkey指定自定义值。
##示例
###例1:计算年龄
这个例子演示了一个mapreduce作业,其中输入键被映射到个人记录,而map函数生成键 根据其中一个记录条目,年龄。
- :::python
>>> import redis >>> from rapmedusa import *>>> conn = redis.StrictRedis(host='localhost', port=6379, db=0) >>> conn.hset('myInput', 1, "{'name': 'Chad', 'age': 43}") >>> conn.hset('myInput', 2, "{'name': 'Ron', 'age': 21}") >>> conn.hset('myInput', 3, "{'name': 'George', 'age': 54}") >>> conn.hset('myInput', 4, "{'name': 'Alice', 'age': 54}")>>> def myMap(key, value): obj = eval(value) emit(str(obj['age']), '1')>>> def myReduce(key, vals): total = 0 for v in vals: total += int(v) return total>>> result = map_reduce(conn, myMap, myReduce, inKey='myInput') >>> print result {'54': '2', '21': '1', '43': '1'}