Python segment-liftover包_程序模块 - PyPI

在整个基因组集合之间转换片段。

segment-liftover的Python项目详细描述

分段提升

在不同的基因组集合之间转换基因组坐标是生物信息学中的一个共同任务。可以使用ucsc liftover、ncbi remap和crossmap等服务和工具来执行此类转换。

当转换基因组片段时，如果片段在新的装配中不连续，这些转换工具将把片段分解成更小的部分。然而，在某些情况下，如拷贝数分析，其中基因组范围的定量表示优先于碱基特定表示，需要保持单个片段的完整性。

而且，所有这些工具都是为单个文件处理而设计的，并且没有提供任何方便批处理的功能。但在生物信息学研究中，人们常常需要一次处理成百上千的文件。

segment_liftover是一个python程序，它可以在基因组集合之间转换片段，而不会将它们分开。它的部分功能是基于基因座近似的再转换，在基因组位置的精确转换失败的情况下。

主要功能：

转换连续段
当直接转换失败时执行近似转换
批量处理任意数量的文件
自动文件夹遍历和文件发现
详细日志
从中断中恢复
接受段（即开始=>；结束）和探针（即单个位置）数据

程序依赖性

片段提升取决于ucsc提升程序，可以在这里找到。请注意，UCSC Liftover仅对非商业用途免费。尽管许可带来不便，Liftover还是提供了一些非常方便的功能：

它是一个独立的命令行工具
它可以转换任何物种的集合，甚至在物种之间
它在本地运行，不需要网络访问

如何安装

最简单的方法是通过pip安装：

pip install segment_liftover
segment_liftover --help

另一种选择是从github中复制片段提升/链/链的片段提升。依赖项需要手动安装。

python3 segmentLiftover.py --help

重要提示：将ucscliftover程序添加到工作目录，或使用-l指定其位置。

如何使用

有关详细信息，请参阅手册。

快速启动

segment_liftover -l ./liftOver -i /Volumes/data/hg18/ -o /Volumes/data/hg19/ -c hg18ToHg19 -si segments.tsv -so seg.tsv

演示模式

segment_liftover -l .liftOver --demo .

这会将一些示例文件复制到当前目录，并使用默认设置运行快速转换。

一般用法

Usage: segment_liftover [OPTIONS]

Options:
  -i, --input_dir TEXT            The directory to start processing.
  -o, --output_dir TEXT           The directory to write new files.
  -c, --chain_file TEXT           Specify the chain file name.
  -si, --segment_input_file TEXT  Specify the segment input file name.
  -so, --segment_output_file TEXT
                                  Specify the segment output file name.
  -pi, --probe_input_file TEXT    Specify the probe input file name.
  -po, --probe_output_file TEXT   Specify the probe output file name.
  -l, --liftover TEXT             Specify the location of the UCSC liftover
                                  program.
  -t, --test_mode INTEGER         Only process a limited number of files.
  -f, --file_indexing             Only generate the index file.
  -x, --index_file FILENAME       Specify an index file containing file paths.
  -m, --mapping_file FILENAME     Specify a pre-defined file of position
                                  mappings.
  --step_size INTEGER             The step size of approximate conversion (in
                                  bases, default:400).
  --range INTEGER                 The searching range of approximate
                                  conversion (in kilo bases, default:10).
  --beta FLOAT                    Parameter in quality control.
  --no_approximate_conversion     Do not perform approximate conversion.
  --new_segment_header TEXT...    Specify 4 new column names for new segment
                                  files.
  --new_probe_header TEXT...      Specify 3 new column names for new probe
                                  files.
  --resume TEXT...                Specify a index file and a progress file to
                                  resume an interrupted job.
  --demo TEXT                     Copy example files to a user defined
                                  directory and run a demonstration.
  --log_path TEXT                 Specify the directory to write logging
                                  files.
  --help                          Show this message and exit.

所需选项为：

-i，--输入目录文本
-o，--输出目录文本
-c，--chain_文件文本
-si，--段输入文件文本和-pi，--探测输入文件文本

升空程序

默认情况下，段liftover查找ucsc liftover程序的系统路径。也可以使用-l选项手动指定。

从输入文件开始

段提升版设计用于在一次运行中处理大量文件。

它需要输入目录，并将遍历所有子目录以索引与输入文件名匹配的所有文件。
它需要输出目录，并将原始目录结构保留在输出目录中。
段和探测文件的处理方式不同-因此，您需要使用不同的选项o传递输入文件名。
也可以创建要启动的输入文件列表。有关详细信息，请参见手册。
输入名支持正则表达式。

输入文件格式

使用-si filename作为段文件名。所有文件应：

分开制表符，不带引号
至少有4列作为id、染色体、开始和结束（名称无关紧要，顺序无关紧要）。

将复制额外的列。

例如：

id	chro	start	stop	value_1	value_2
GSM378022	1	775852	143752373	0.025	9992
GSM378022	1	143782024	214220966	0.1607	6381
GSM378022	2	88585000	144628991	0.0131	4256
GSM378022	2	144635510	146290468	0.1432	146
GSM378022	3	48603	8994748	0.0544	1469

使用-pi filename作为探测文件名。所有文件应：

分开制表符，不带引号
至少有3列作为id、染色体和位置（名称不重要，顺序重要）。

将复制额外的列。

例如：

PROBEID	CHRO	BASEPOS	VALUE
ID_2_1	1	51599	-0.6846
ID_3_2	1	51672	-0.2546
ID_4_3	1	51687	0.0833
ID_5_4	1	52016	-0.5201
ID_6_5	1	52784	0.1997
ID_7_6	1	52801	-0.3800
ID_8_7	1	62568	-0.2435
ID_9_8	1	62640	0.3516
ID_10_9	1	72034	-0.5687

染色体名

支持两种格式：chr10或10。

链文件

ucsc liftover程序需要一个链文件才能从一个程序集转换到另一个程序集，因此segment liftover也需要一个链文件。

人类基因组版本的公共链文件（来自ucsc）是片段liftover的一部分。有关详细信息，请查看手册。

其他链文件可以在ucsc下载区域访问

日志文件

默认情况下，转换后会在输出目录中创建一个log/目录。

./logs/parameters.log	The command history and parameter settings.
./logs/fileList.log    The indexing file from traversing input_dir.
./logs/general.log    The main log file, keeps records for all the works done and errors encountered.
./logs/progress.log    A list of successfully processed files.
./logs/unconverted.log    A list of all positions that could not be lifted and re-converted.
./logs/approximate_conversion.log    A list of all the approximately converted positions (when LiftOver fails).
./logs/failed_files.log		A list of files failed to be converted.

如果段提升未按预期工作，可以查看general.log以获取执行详细信息。

如果您对唯一的重新转换或未转换的结果感兴趣，可以检查近似转换.log

如果要获取特定文件的拒绝或转换结果的信息，可以检查unerted.log

python依赖项

脚本是在python3.6中开发的

软件包：单击6.7，然后单击0.20.1

高级使用

从文件开始

使用索引文件选项，可以提供包含要处理的文件的文件。每行一个文件名，使用文件的完整路径。

每次运行后，可以在/logs/中找到一个filelist.log文件，该文件可作为下次的快速启动。您还可以使用以下命令生成文件列表：

>segment_liftover -i /Volumes/data/hg18/ -o /Volumes/data/hg19/ -c hg18ToHg19 -si segments.tsv -x ./myfilelist.txt

pip install segment_liftover
segment_liftover --help

0
并行处理
段提升不支持直接的多处理，但非常任务可以分成更小的任务，并轻松地并行运行。
首先，按照从文件开始部分中的说明生成文件列表。
然后（可选），对文件列表中的行进行无序排列
接下来，将文件列表拆分成较小的文件，并将它们放在单独的文件夹中。
最后，在每个文件夹中运行带有选项的lift_over--index_file。
标签：
文件
the
text
程序
目录
log
liftover
基因组
segment
specify
欢迎加入QQ群-->： 979659372
推荐PyPI第三方库
slipo-loci
分析兴趣点和领域的高级图书馆
wrapA
api测试的请求包装器
uqid
随机字符和可选日期时间的唯一ID。
django-zendesk-tickets
向Zendesk提交机票的Django视图和表单
deux-q5
django-rest框架的多因素认证
httpie-django-auth
Django auth plugin for httpie.
arrpc
简单、快速、轻便的python rpc
PyCULA
culatools的python包装
btrade-api-client
BTrade API客户端
cs.migration.folder2donedukia
未知
beves
使用telegram bot发送通知的简单包装器
idf-analysis
根据DWA-A 531（2012），暴雨是持续时间和重现期的函数
django-method-override
用于http方法重写表单params和header的django中间件
kiwi-cache
缓存，用于对不同源使用redis。
pyramid_methodrewrite
基于查询字符串参数重写http方法的金字塔插件。

导航栏
项目描述
版本历史
下载文件
项目链接
首页
标签
许可证: BSD许可证（BSD 3条款）
作者信息:: 暂无
维护者
bogao
最新PyPI项目
italian_vip_says
UFx
vofs
fake_item_generator
NerEva
django-monologue
fio_product_attribute_strict
climailsystem
pyshape
tbb-devel
npy-append-arra
anthill.tal.macrorenderer
odoo11-addon-stock-a
uuuu
contextil
fyl_nester
appomatic_renderable
teacher
chuletas
slackbot_ce
最新Python常见问题
python语法错误（如果不在Z中，则在X中表示s）
Python语法错误（无效）概率
python语法错误*带有可选参数的args
python语法错误2.5版有什么办法解决吗？
Python语法错误2.7.4
python语法错误30/09/2013
Python语法错误E001
Python语法错误not（）op
python语法错误outpu
Python语法错误print len（）
python语法错误w3
Python语法错误不是caugh
python语法错误及yt-packag的使用
python语法错误可以查出来！！瓦里亚布
Python语法错误可能是缩进？

segment-liftover 0.955

segment-liftover的Python项目详细描述

分段提升

程序依赖性

如何安装

快速启动

演示模式

一般用法

升空程序

从输入文件开始

输入文件格式

染色体名

链文件

并行处理

推荐PyPI第三方库

slipo-loci

wrapA

uqid

django-zendesk-tickets

deux-q5

httpie-django-auth

arrpc

PyCULA

btrade-api-client

cs.migration.folder2donedukia

beves

idf-analysis

django-method-override

kiwi-cache

pyramid_methodrewrite

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签