分层社区网络,数据驱动的omics集成

hiconet的Python项目详细描述


Hierarchical社区网络(HiCoNet)

用于集成从公共主题组收集的多个数据类型。

一个组学数据集包含了大量的冗余,相似数量模式的特征可以看作是群体。随着组合空间越来越大和复杂,常用的特征级集成方法可能会加剧冗余问题。hiconet检测每个数据类型中的社区,然后测试跨数据类型的社区之间的关联。这种“分层社区网络”为被测生物的组织结构提供了一个合理的模型。

我们在一项VZV疫苗研究(Li等人,2017,https://doi.org/10.1016/j.cell.2017.04.026)中开始采用这种方法,并通过其他科学项目进一步发展了这种方法。

三文件社会数据结构

每种数据类型通常都有自己的特性。为了能够自动化复杂的分析,需要一种通用的公共分母格式。 我们使用三个文件来描述一种数据类型,数据矩阵,特征注释和观测注释。 datamatrix文件使用一行作为观测ID,使用一列作为要素ID。这要求每个观察到的每个特征都有唯一的标识符,并将元数据从数据矩阵中分离出来。因为特征或观察上的注释可以是异类的,但不应该影响数据矩阵的格式。

有关术语的详细信息:

Study: an administrative unit that include one or more projects. Same as ImmPort "Study" (https://www.immport.org/resources/documentation). 

Project: a collection of data of one or more types (a dataset). For multiple data types, common samples/subjects are expected, as HiCoNet deals with the `N-integration` problem.
    This is the unit HiCoNet works on - HiCoNet integrates DataMatrices within a DataSet
    A DataSet should have at least one Society of data.

Society: one data type, defined by a set of DataMatrix, FeatureAnnotation (optional) and ObservationAnnotation (optional). 
    This 3-file design is similar to anndata (https://github.com/theislab/anndata) but data are transposed. Meta data can differ for different data types.

DataMatrix: a data matrix of [continuous] values that represent a biological state or concentration, of the same data type.
    E.g. transcriptomics (array intensity or transcript counts), metabolomics (peak intensity/area) or microbial OTU counts.
    This can include different time points or treatments. This is the unit community detection is based on.

ObservationAnnotation: an observation is an experimental measurement of a biological sample. 
    A sample may have measurement replicates. Description of biological samples should be in ObservationAnnotation, which can support inferring the study design (e.g. treatment, time points). For ImmPort data, the MySQL table `biosample` can serve as ObservationAnnotation. Time points and treatment are key annotation variables in many studies. 

FeatureAnnotation: meta data on features. 
    This can be as simple as gene annotation, which can even be optional. But a feature may carry a defition of multiple parameters. E.g. a metabolite feaure may have m/z, retention time and collision cross section, and these parameters may be used for certain algorithms.

Graph: a graph/network for relationships in the data (e.g. used in loom format, loompy.org). The current version of HiCoNet does not store this, but will consider it for future versions.

Community: a group of features within a society that share a similar pattern.

需要

'PyYAML'
'numpy',
'scipy',
'pandas',
'sklearn',
'leidenalg',
'scanpy',
'igraph',
'fuzzywuzzy',

Note: python-igraph requires the C library igraph. The installation on Mac OS may be tricky:  
https://stackoverflow.com/questions/45667147/install-python-igraph-on-mac
I did pip3 install ~/Downloads/python-igraph-0.7.1-1.tar.gz
For a Docker or new install, both igraph and python-igraph are needed.

使用

This software package is available via PyPI (Python Package Index) and GitHub.
Test datasets are included. E.g. to run test:
python3 -m hiconet.HiCoNet hiconet/datasets/SDY80

There are related but separate projects of hiconet-server and hiconet-explorer.

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java重写父类中的特定行   java Apache Commons CLI订购帮助选项?   java如何将数据添加到网格视图   java如何在Apache Camel批处理后移动文件?   java如何为日期范围的between子句编写hql查询?   雅加达ee开始Java编程,我应该从哪里开始?   排序Java8+流:检查我的objectinstances的两个字段的列表顺序是否正确   java如何将json转换为Map<String,Object>确保整数为整数   java不能在Spring数据JPA批处理过程中创建TransactionException   java损坏的PDF文件从FTP下载到使用Apache Common Net的设备   java无法使用Spring批处理和Wso2为XML架构命名空间找到Spring NamespaceHandler   java Android ImageView未显示在SherlockFragment中   Maven在构建时出错=无法识别Java路径   java如何使用批处理文件调用关闭处理程序?   java admob广告横幅重叠我的游戏屏幕安卓