轻量级管道:使用python函数作为管道作业。
joblib的Python项目详细描述
joblib是一组工具,用于在 python。特别是:
- 函数的透明磁盘缓存和延迟重新计算 (备忘录模式)
- 简单易用的并行计算
joblib被优化为fast和robust特别是在大型 并对numpy数组进行了特定的优化。它是 bsd许可。
Documentation: https://joblib.readthedocs.io Download: https://pypi.python.org/pypi/joblib#downloads Source code: https://github.com/joblib/joblib Report issues: https://github.com/joblib/joblib/issues
视野
我们的愿景是提供工具,以轻松实现更好的性能和 处理长时间运行的作业时的再现性。
- Avoid computing twice the same thing: code is rerun over an over, for instance when prototyping computational-heavy jobs (as in scientific development), but hand-crafted solution to alleviate this issue is error-prone and often leads to unreproducible results
- Persist to disk transparently: persisting in an efficient way arbitrary objects containing large data is hard. Using joblib’s caching mechanism avoids hand-written persistence and implicitly links the file on disk to the execution context of the original Python object. As a result, joblib’s persistence is good for resuming an application status or computational job, eg after a crash.
joblib在离开代码和流时解决这些问题 尽可能不修改控件(没有框架,没有新的范例)。
主要功能
输出值的透明快速磁盘缓存:备忘录或 为python函数创建类似的功能 任意的python对象,包括非常大的numpy数组。分开 域逻辑或算法的持久性和流执行逻辑 通过将操作作为一组定义良好的步骤编写代码 输入和输出:python函数。joblib可以保存 计算到磁盘并仅在必要时重新运行:
>>> from joblib import Memory >>> cachedir = 'your_cache_dir_goes_here' >>> mem = Memory(cachedir) >>> import numpy as np >>> a = np.vander(np.arange(3)).astype(np.float) >>> square = mem.cache(np.square) >>> b = square(a) # doctest: +ELLIPSIS ________________________________________________________________________________ [Memory] Calling square... square(array([[0., 0., 1.], [1., 1., 1.], [4., 2., 1.]])) ___________________________________________________________square - 0...s, 0.0min >>> c = square(a) >>> # The above call did not trigger an evaluation
让人难堪的并行助手:使其易于写可读 并行代码并快速调试:
>>> from joblib import Parallel, delayed >>> from math import sqrt >>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10)) [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
快速压缩持久性:pickle的替代品 有效地处理包含大数据的python对象( joblib.dump&;joblib.load)。