分层社区网络,数据驱动的omics集成

hiconet的Python项目详细描述


Hierarchical社区网络(HiCoNet)

用于集成从公共主题组收集的多个数据类型。

一个组学数据集包含了大量的冗余,相似数量模式的特征可以看作是群体。随着组合空间越来越大和复杂,常用的特征级集成方法可能会加剧冗余问题。hiconet检测每个数据类型中的社区,然后测试跨数据类型的社区之间的关联。这种“分层社区网络”为被测生物的组织结构提供了一个合理的模型。

我们在一项VZV疫苗研究(Li等人,2017,https://doi.org/10.1016/j.cell.2017.04.026)中开始采用这种方法,并通过其他科学项目进一步发展了这种方法。

三文件社会数据结构

每种数据类型通常都有自己的特性。为了能够自动化复杂的分析,需要一种通用的公共分母格式。 我们使用三个文件来描述一种数据类型,数据矩阵,特征注释和观测注释。 datamatrix文件使用一行作为观测ID,使用一列作为要素ID。这要求每个观察到的每个特征都有唯一的标识符,并将元数据从数据矩阵中分离出来。因为特征或观察上的注释可以是异类的,但不应该影响数据矩阵的格式。

有关术语的详细信息:

Study: an administrative unit that include one or more projects. Same as ImmPort "Study" (https://www.immport.org/resources/documentation). 

Project: a collection of data of one or more types (a dataset). For multiple data types, common samples/subjects are expected, as HiCoNet deals with the `N-integration` problem.
    This is the unit HiCoNet works on - HiCoNet integrates DataMatrices within a DataSet
    A DataSet should have at least one Society of data.

Society: one data type, defined by a set of DataMatrix, FeatureAnnotation (optional) and ObservationAnnotation (optional). 
    This 3-file design is similar to anndata (https://github.com/theislab/anndata) but data are transposed. Meta data can differ for different data types.

DataMatrix: a data matrix of [continuous] values that represent a biological state or concentration, of the same data type.
    E.g. transcriptomics (array intensity or transcript counts), metabolomics (peak intensity/area) or microbial OTU counts.
    This can include different time points or treatments. This is the unit community detection is based on.

ObservationAnnotation: an observation is an experimental measurement of a biological sample. 
    A sample may have measurement replicates. Description of biological samples should be in ObservationAnnotation, which can support inferring the study design (e.g. treatment, time points). For ImmPort data, the MySQL table `biosample` can serve as ObservationAnnotation. Time points and treatment are key annotation variables in many studies. 

FeatureAnnotation: meta data on features. 
    This can be as simple as gene annotation, which can even be optional. But a feature may carry a defition of multiple parameters. E.g. a metabolite feaure may have m/z, retention time and collision cross section, and these parameters may be used for certain algorithms.

Graph: a graph/network for relationships in the data (e.g. used in loom format, loompy.org). The current version of HiCoNet does not store this, but will consider it for future versions.

Community: a group of features within a society that share a similar pattern.

需要

'PyYAML'
'numpy',
'scipy',
'pandas',
'sklearn',
'leidenalg',
'scanpy',
'igraph',
'fuzzywuzzy',

Note: python-igraph requires the C library igraph. The installation on Mac OS may be tricky:  
https://stackoverflow.com/questions/45667147/install-python-igraph-on-mac
I did pip3 install ~/Downloads/python-igraph-0.7.1-1.tar.gz
For a Docker or new install, both igraph and python-igraph are needed.

使用

This software package is available via PyPI (Python Package Index) and GitHub.
Test datasets are included. E.g. to run test:
python3 -m hiconet.HiCoNet hiconet/datasets/SDY80

There are related but separate projects of hiconet-server and hiconet-explorer.

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
Firebase Android PhoneAuthProvider的java内存泄漏   使用bufferreader从Android java类中的php文件获取数据   java从逗号分隔的字符串创建列表,将大括号字符串作为一个对象   java动态创建一个树形图并遍历它   java使用类。要在其中加载文件的getResource()。罐子   R与java之间的数据类型转换   java Android大文本视图动态   java If语句似乎在满足需求的情况下跳过   父类中的java日志记录静态方法   neo4j中的java复制关系与spring数据   JVM是32位还是64位?   java如何缩短此KeyListener代码   java Checkstyle和Findbugs安装   java BoneCP语句句柄不能强制转换为JDBC