我写了下面的python程序来处理excel文件中的数据。但是现在有没有可能用hadoopmapreduce运行相同的程序,传统的程序和mapreduce程序有何不同?你知道吗
import xlrd
with xlrd.open_workbook('interference.xlsx') as book:
# 0 corresponds for 1st worksheet, usually named 'Book1'
sheet = book.sheet_by_index(0)
# gets col C values
B = [ B for B in sheet.col_values(1) ]
# gets col D values
D = [ D for D in sheet.col_values(3) ]
# combines D and E elements to tuples, combines tuples to list
# ex. [ ('Incoming', 18), ('Outgoing', 99), ... ]
data = list( zip(B,D) )
# gets total no of get request attempts for each UID
x=1
for uid in data:
while x <=44 :
attempts = sum( tup[1] for tup in data if tup[0] == x )
print("Total attempts for UID",x, attempts)
x=x+1
在Hadoop中不可能运行与MapReduce作业相同的程序。你知道吗
MapReduce是一种编程范式,其思想是将计算分为两个阶段:第一阶段(mapping)将问题分解为多个子问题,并对每个子问题进行求解;第二阶段(reduce)将子问题的所有结果汇总在一起,得到最终解。你知道吗
我建议您看看WordCount程序,它是Hadoop的helloworld:http://wiki.apache.org/hadoop/WordCount
相关问题 更多 >
编程相关推荐