测试驱动的数据分析
tdda的Python项目详细描述
这是什么?
tdda python模块为 数据分析的整个过程,通过以下工具:
- Reference Testing: extensions to unittest and pytest for managing testing of data analysis pipelines, where the results are typically much larger, and more complex, than single numerical values.
- Constraints: tools (and API) for discovery of constraints from data, for validation of constraints on new data, and for anomaly detection.
- Finding Regular Expressions: tools (and API) for automatically inferring regular expressions from text data.
安装
安装所有tdda python模块的最简单方法是使用pip:
pip install tdda
包括所有示例在内的全套源代码都可以从 PYPI带:
pip download –no-binary :all: tdda
这些资源也可以从github公开获得:
git clone git@github.com:tdda/tdda.git
文档可在http://tdda.readthedocs.io获得。
如果克隆github repo,请使用
python setup.py install
然后安装命令行工具(tdda和rexpy)。
引用测试
tdda.referencetest库用于支持 基于unittest或pytest创建引用测试。
这些测试与其他测试一样,除了:
- They have special support for comparing strings to files and files to files.
- That support includes the ability to provide exclusion patterns (for things like dates and versions that might be in the output).
- When a string/file assertion fails, it spits out the command you need to diff the output.
- If there were exclusion patterns, it also writes modified versions of both the actual and expected output and also prints the diff command needed to compare those.
- They have special support for handling CSV files.
- It supports flags (-w and -W) to rewrite the reference (expected) results once you have confirmed that the new actuals are correct.
有关源分发或签出的更多详细信息,请参见readme.md 文件和示例位于referencetest子目录中。
约束
tdda.constraints库用于“发现”约束 从(pandas)数据框中,将它们写为json,并验证 数据集满足约束文件中的约束。
有关源分发或签出的更多详细信息,请参见readme.md 文件和示例位于constraints子目录中。
查找正则表达式
tdda存储库还包括一个自动 从单个数据示例字段推断正则表达式。
资源
有关这些主题的资源包括:
- TDDA Blog: http://www.tdda.info
- Quick Reference Guide (“Cheatsheet”): http://www.tdda.info/pdf/tdda-quickref.pdf
- Full documentation: http://tdda.readthedocs.io
- General Notes on Constraints and Assertions: http://www.tdda.info/constraints-and-assertions
- Notes on using the Pandas constraints library: http://www.tdda.info/constraint-discovery-and-verification-for-pandas-dataframes
- PyCon UK Talk on TDDA:
- Video: https://www.youtube.com/watch?v=FIw_7aUuY50
- Slides and Rough Transcript: http://www.tdda.info/slides-and-rough-transcript-of-tdda-talk-from-pycon-uk-2016
所有示例、测试和代码都在Python2.7、Python3.5和Python3.6下运行。