支持数据科学项目的python类。
resumableds的Python项目详细描述
可恢复的
支持数据科学项目的python类。
Resumabled支持您编写数据科学脚本,包括保存/恢复功能。 可以保存和恢复数据,避免从数据存储中不必要地检索原始数据。 数据目录结构的灵感来自cookiecutter数据科学(https://drivendata.github.io/cookiecutter-data-science/)。 类还支持语句“analysisisadag”(https://drivendata.github.io/cookiecutter-data-science/#analysis-is-a-dag)。
resumabled是用纯Python编写的,打算在Jupyter笔记本中使用
示例
<code>
proj1 = RdsProject('project1') # create object from class (creates the dir if it doesn't exist yet)
proj1.raw.df1 = pd.DataFrame() # create dataframe as attribute of proj1.raw (RdsFs 'raw')
proj1.defs.variable1 = 'foo' # create simple objects as attribute of proj1.defs (RdsFs 'defs')
proj1.save() # saved attributes of all RfdFs in proj1 to disk
</code>
This will result in the following directory structure (plus some overhead of internals):
- <output_dir>/defs/var_variable1.pkl
- <output_dir>/raw/df1.pkl
- <output_dir>/raw/df1.csv
Note, pandas dataframes are always dumped as pickle for further processing and as csv for easy exploration. The csv files are never read back anymore.
Later on or in another python session, you can do this:
proj2 = RdsProject('project1') # create object from class (doesn't touch the dir as it already exists) All vars and data is read back to their original names.
proj2.defs.variable1 == 'foo' ==> True
isinstance(proj2.raw.df1, pd.DataFrame) ==> True
可恢复的指示灯