一个多功能的工具,用于以.cool格式对hi-c数据执行堆积分析。

coolpupp的Python项目详细描述


冷却时间py

DOI

cool文件pile-ups和python。

简介

以.cool格式(https://github.com/mirnylab/cooler)对hi-c数据执行堆积分析的通用工具。谁不喜欢酷的木偶?

<酷>是一个现代的、灵活的(最好的,我认为)格式来存储HI-C数据。 它使用hdf5来存储稀疏的hi-c数据表示,这使得在处理高分辨率数据集时内存需求较低。另一种存储hi-c数据的流行格式,.hic,可以使用hic2coolhttps://github.com/4dn-dcic/hic2cool)转换为.cool文件。

有关详细信息,请参见:

Abdennur,N.和Mirny,L.(2019年)。冷却器:可扩展存储hi-c数据和其他基因组标记阵列。BioXiV,557660。doi:10.1101/557660

什么是连环相撞?

这就是连环相撞的原理,用来检查某些区域是否倾向于相互作用:

这里没有显示的是对预期值的标准化。这可以通过两种方式实现:要么使用具有不同距离(输出cooltools compute-expected)的预期交互值的提供的文件,要么直接从hi-c数据中通过在随机移动的控制区域上划分堆来实现。如果不使用预期的规范化方法(仅设置--nshifts 0),则这与apa方法基本相同(rao等人,2014),后者可用于平均强相互作用区域,例如带注释的循环。对于较弱的相互作用体,接触概率随距离的衰减将隐藏任何可以观察到的焦点富集。

coolpup.py特别适合于分析大量潜在的相互作用,因为它一个接一个地将整个染色体加载到内存中(或并行加速),以快速提取小的子矩阵。必须将所有内容读入内存会使少量循环的速度相对较慢,但在达到大量交互之前,性能不会降低。

入门

安装

除了cooltools之外的所有需求都可以从pypi或conda获得。对于cooltools,请执行

pip install https://github.com/mirnylab/cooltools/archive/master.zip

对于coolpuppy(和其他依赖项),只需执行以下操作:

pip install coolpuppy

pip install https://github.com/Phlya/coolpuppy/archive/master.zip

从github获取最新版本。这将使coolpup.py在终端中可调用,并且在python中可作为coolpuppy导入。

用法

帮助消息将帮助您开始使用该工具。它是一个命令,有很多选择,可以做很多事情!

Usage: coolpup.py [-h] [--pad PAD] [--minshift MINSHIFT] [--maxshift MAXSHIFT]
                  [--nshifts NSHIFTS] [--expected EXPECTED]
                  [--mindist MINDIST] [--maxdist MAXDIST] [--minsize MINSIZE]
                  [--maxsize MAXSIZE] [--excl_chrs EXCL_CHRS]
                  [--incl_chrs INCL_CHRS] [--subset SUBSET] [--anchor ANCHOR]
                  [--by_window] [--save_all] [--local] [--unbalanced]
                  [--coverage_norm] [--rescale] [--rescale_pad RESCALE_PAD]
                  [--rescale_size RESCALE_SIZE] [--weight_name WEIGHT_NAME]
                  [--n_proc N_PROC] [--outdir OUTDIR] [--outname OUTNAME]
                  [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]
                  coolfile baselist

positional arguments:
  coolfile              Cooler file with your Hi-C data
  baselist              A 3-column bed file or a 6-column double-bed file
                        (i.e. chr1,start1,end1,chr2,start2,end2). Should be
                        tab-delimited. With a bed file, will consider all cis
                        combinations of intervals. To pileup features along
                        the diagonal instead, use the --local argument. Can be
                        piped in via stdin, then use "-".

optional arguments:
  -h, --help            show this help message and exit
  --pad PAD             Padding of the windows around the centres of specified
                        features (i.e. final size of the matrix is 2×pad+res),
                        in kb. Ignored with --rescale, use --rescale_pad
                        instead. (default: 100)
  --minshift MINSHIFT   Shortest distance for randomly shifting coordinates
                        when creating controls (default: 100000)
  --maxshift MAXSHIFT   Longest distance for randomly shifting coordinates
                        when creating controls (default: 1000000)
  --nshifts NSHIFTS     Number of control regions per averaged window
                        (default: 10)
  --expected EXPECTED   File with expected (output of cooltools compute-
                        expected). If None, don't use expected and use
                        randomly shifted controls (default: None)
  --mindist MINDIST     Minimal distance of intersections to use. If not
                        specified, uses 2*pad+2 (in bins) as mindist (default:
                        None)
  --maxdist MAXDIST     Maximal distance of intersections to use (default:
                        None)
  --minsize MINSIZE     Minimal length of features to use for local analysis
                        (default: None)
  --maxsize MAXSIZE     Maximal length of features to use for local analysis
                        (default: None)
  --excl_chrs EXCL_CHRS
                        Exclude these chromosomes from analysis (default:
                        chrY,chrM)
  --incl_chrs INCL_CHRS
                        Include these chromosomes; default is all. excl_chrs
                        overrides this. (default: all)
  --subset SUBSET       Take a random sample of the bed file - useful for
                        files with too many featuers to run as is, i.e. some
                        repetitive elements. Set to 0 or lower to keep all
                        data. (default: 0)
  --anchor ANCHOR       A UCSC-style coordinate to use as an anchor to create
                        intersections with coordinates in the baselist
                        (default: None)
  --by_window           Create a pile-up for each coordinate in the baselist.
                        Will save a master-table with coordinates, their
                        enrichments and cornerCV, which is reflective of
                        noisiness (default: False)
  --save_all            If by-window, save all individual pile-ups in a
                        separate json file (default: False)
  --local               Create local pileups, i.e. along the diagonal
                        (default: False)
  --unbalanced          Do not use balanced data. Useful for single-cell Hi-C
                        data together with --coverage_norm, not recommended
                        otherwise. (default: False)
  --coverage_norm       If --unbalanced, also add coverage normalization based
                        on chromosome marginals (default: False)
  --rescale             Do not use centres of features and pad, and rather use
                        the actual feature sizes and rescale pileups to the
                        same shape and size (default: False)
  --rescale_pad RESCALE_PAD
                        If --rescale, padding in fraction of feature length
                        (default: 1.0)
  --rescale_size RESCALE_SIZE
                        If --rescale, this is used to determine the final size
                        of the pileup, i.e. it will be size×size. Due to
                        technical limitation in the current implementation,
                        has to be an odd number (default: 99)
  --weight_name WEIGHT_NAME
                        Name of the norm to use for getting balanced data
                        (default: weight)
  --n_proc N_PROC       Number of processes to use. Each process works on a
                        separate chromosome, so might require quite a bit more
                        memory, although the data are always stored as sparse
                        matrices (default: 1)
  --outdir OUTDIR       Directory to save the data in (default: .)
  --outname OUTNAME     Name of the output file. If not set, is generated
                        automatically to include important information.
                        (default: auto)
  -l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --log {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                        Set the logging level. (default: INFO)

目前,coolpup.py不支持染色体间的堆积,但这是一个计划在未来添加。

绘图结果

灵活绘制,建议使用{{CD13}}。但是这个包中包含了简单的绘图功能。只需使用所需选项运行plotpup.py,并列出要绘制的coolpup.py的所有输出文件。

Usage: plotpup.py [-h] [--cmap CMAP] [--symmetric SYMMETRIC] [--vmin VMIN]
                  [--vmax VMAX] [--scale {linear,log}]
                  [--cbar_mode {edge,each,single}] [--n_cols N_COLS]
                  [--col_names COL_NAMES] [--row_names ROW_NAMES]
                  [--norm_corners NORM_CORNERS] [--enrichment ENRICHMENT]
                  [--output OUTPUT]
                  [pileup_files [pileup_files ...]]

positional arguments:
  pileup_files          All files to plot (default: None)

optional arguments:
  -h, --help            show this help message and exit
  --cmap CMAP           Colourmap to use (see
                        https://matplotlib.org/users/colormaps.html) (default:
                        coolwarm)
  --symmetric SYMMETRIC
                        Whether to make colormap symmetric around 1, if log
                        scale (default: True)
  --vmin VMIN           Value for the lowest colour (default: None)
  --vmax VMAX           Value for the highest colour (default: None)
  --scale {linear,log}  Whether to use linear or log scaling for mapping
                        colours (default: log)
  --cbar_mode {edge,each,single}
                        Whether to show a single colorbar, one per row or one
                        for each subplot (default: single)
  --n_cols N_COLS       How many columns to use for plotting the data. If 0,
                        automatically make the figure as square as possible
                        (default: 0)
  --col_names COL_NAMES
                        A comma separated list of column names (default: None)
  --row_names ROW_NAMES
                        A comma separated list of row names (default: None)
  --norm_corners NORM_CORNERS
                        Whether to normalize pileups by their top left and
                        bottom right corners. 0 for no normalization, positive
                        number to define the size of the corner squares whose
                        values are averaged (default: 0)
  --enrichment ENRICHMENT
                        Whether to show the level of enrichment in the central
                        pixels. 0 to not show, odd positive number to define
                        the size of the central square whose values are
                        averaged (default: 1)
  --output OUTPUT       Where to save the plot (default: pup.pdf)

引用coolpup.py

在发表在同行评议的期刊上之前,请引用我们的预印本

coolpup.py-一个多功能的工具,用于执行hi-c数据的堆积分析

Ilya M.Flyamer、Robert S.Illingworth、Wendy A.Bickmore

https://www.biorxiv.org/content/10.1101/586537v1

此工具已在下列出版物中使用过

dna甲基化指导多omb依赖的3d基因组在原始多能性中的重组

凯蒂A麦克劳林,伊利亚M弗利默,约翰P汤姆森,海蒂K乔森,鲁奇舒克拉,伊恩威廉森,格雷姆R格里姆斯,罗伯特S伊林沃斯,伊恩R亚当斯,萨里彭宁,理查德R米汉,温迪A比克莫尔

https://www.biorxiv.org/content/10.1101/527309v1

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java如何在Spring数据mongo查询中使用聚合和排序   java从内存中删除ArrayList对象   java为什么我的小程序有警告标志?   java将序列化对象写入文件   java访问mac上src文件夹中的文件   java如何在Spring servlet的所有@RequestMapping方法上添加过滤器?   java PDFBox无法正确读取   java Android应用程序依赖项ClassNotFoundException   如何按数组中的五个元素读取数组列表   安卓可以将java正切转换为双精度   在java中自动更改日期格式   从另一台PC(从java应用程序)连接到MySQL服务器   java列表<Character>如何获取特定索引?   java如何将数组中的泛型对象设置为0/null?