如何在ipython并行处理中得到中间结果？

>>> from IPython.parallel import Client >>> import os >>> c = Client() >>> c.ids >>> lview = c.load_balanced_view() >>> lview.block =True >>> def return_len(xml_filepath): import xml.etree.cElementTree as cElementTree tree = cElementTree.parse(xml_filepath) my_count=0 file_result=[] cdict={} for elem in tree.getiterator(): cdict[my_count]={} if elem.tag: cdict[my_count]['tag']=elem.tag if elem.text: cdict[my_count]['text']=(elem.text).strip() if elem.attrib.items(): cdict[my_count]['xmlattb']={} for key, value in elem.attrib.items(): cdict[my_count]['xmlattb'][key]=value if list(elem): cdict[my_count]['xmlinfo']=len(list(elem)) if elem.tail: cdict[my_count]['tail']=elem.tail.strip() my_count+=1 output=xml_filepath.split('\\')[-1],len(cdict) return output ## return cdict >>> def get_dir_list(target_dir, *extensions): """ This function will filter out the files from given dir based on their extensions """ my_paths=[] for top, dirs, files in os.walk(target_dir): for nm in files: fileStats = os.stat(os.path.join(top, nm)) if nm.split('.')[-1] in extensions: my_paths.append(top+'\\'+nm) return my_paths >>> r=lview.map_async(return_len,get_dir_list('C:\\test_folder','xsd','xml'))

1条回答

网友

1楼 · 发布于 2024-04-26 13:47:20

当然可以。AsyncMapResult（map_async返回的类型）立即可读取，迭代生成的项与r.get()最终生成的列表相同。所以在你做了之后：

amr = lview.map_async(return_len, get_dir_list('C:\\test_folder','xsd','xml'))

您可以：

^{pr2}$

或者使用enumerate保存索引

for i,r in enumerate(amr):
    print i, r

或者使用reduce内置函数执行缩减：

summary_result = reduce(myfunc, amr)

当结果到达时，所有这些都将在结果中迭代。如果您不关心排序，并且每个任务的时间都有很大的变化，您可以通过map_async(...,ordered=False)。如果您这样做，当您迭代AMR时，您将以先到先得的方式获得单个结果，而不是保留提交顺序。在

还有更多信息in the ipython docs。在

also to deal with import/namespace error i have defined import inside of return_len function; is there any better way to deal with that?

是和否。有几种方法可以设置引擎名称空间，例如使用模块、@parallel.require("module")装饰器，或者简单地使用%px import xml.etree.cElementTree as cElementTree显式执行导入，每种方法在某些情况下都有好处。但我经常发现在函数中添加导入是最简单的方法，而且意外最少。在

相关问题更多 >

编程相关推荐

热门问题

热门文章