Python samplot包_程序模块 - PyPI

基因组结构变异绘图包

samplot的Python项目详细描述

<；center>；

<；center>； <；center>；

<；center>；

samplot是用于快速、多样本结构变量的命令行工具形象化。samplot获取sv坐标和bam文件并生成高质量图像，突出显示任何对齐和深度信号证实SV。

用法

Usage: samplot.py [options]

Options:
  -h, --help            show this help message and exit
  --marker_size=MARKER_SIZE
                        Size of marks on pairs and splits (default 3)
  -n TITLES             Space-delimited list of plot titles. Use quote marks to include spaces (i.e. \"plot 1\" \"plot 2\")"
  -r REFERENCE          Reference file for CRAM
  -z Z                  Number of stdevs from the mean (default 4)
  -b BAMS               Bam file names (CSV)
  -o OUTPUT_FILE        Output file name
  -s START              Start range
  -e END                End range
  -c CHROM              Chromosome range
  -w WINDOW             Window size (count of bases to include), default(0.5 *
                        len)
  -d MAX_DEPTH          Max number of normal pairs to plot
  -t SV_TYPE            SV type
  -T TRANSCRIPT_FILE    GFF of transcripts
  -A ANNOTATION_FILE    Space-delimited list of bed.gz tabixed files of annotations (such as repeats, mappability, etc.)
  -a                    Print commandline arguments
  -H PLOT_HEIGHT        Plot height
  -W PLOT_WIDTH         Plot width
  -j                    Create only the json file, not the image plot
  --long_read=LONG_READ
                        Min length of a read to be a long-read (default 1000)
  --common_insert_size  Set common insert size for all plots

安装

由于samplot作为python脚本运行，因此使用它的唯一要求是python的工作版本（2或3）和所需的python库。使用conda可以方便地安装这些库：

conda install -y --file https://raw.githubusercontent.com/ryanlayer/samplot/master/requirements.txt

如果您对pysam有问题，则可能需要更新您的conda频道：

conda config --add channels r
conda config --add channels bioconda

所有这些库也可以从pip中获得。

您可以通过克隆git存储库来下载samplot：

git clone https://github.com/ryanlayer/samplot.git

无需其他安装。

示例：

samplot需要bam文件或cram文件作为主输入。如果你使用克拉姆，你还需要一个参考基因组，就像使用1000个基因组一样项目（ftp://ftp trace.ncbi.nih.gov/1000genomes/ftp/technical/reference/human_g1k_v37.fasta.gz）。

基本用例

使用NA12878、NA12889和NA12890中的数据 1000基因组计划，我们将在4:115928726-115931880检查NA12878中可能的删除两个无关样本NA12889和NA12890中的同一区域。

以下命令将创建该区域的图像：

time samplot/src/samplot.py \
    -n NA12878 NA12889 NA12890 \
    -b samplot/test/data/NA12878_restricted.bam \
      samplot/test/data/NA12889_restricted.bam \
      samplot/test/data/NA12890_restricted.bam \
    -o 4_115928726_115931880.png \
    -c chr4 \
    -s 115928726 \
    -e 115931880 \
    -t DEL

real    0m9.450s
user    0m9.199s
sys     0m0.217s

上面使用的参数是：

-n绘图中每个示例要显示的名称

-b样本的bam/cram文件（空格分隔）

-o包含绘图的输出文件的名称

-c感兴趣区域的染色体

-s感兴趣区域的起始位置

-e感兴趣区域的结束位置

-t感兴趣变量的类型

这将创建一个名为4_115928726_115931880.png的图像文件，如下所示：

下采样"正常"对
只有绘制一部分协和图，才能减少samplot的运行时间。
对端读数（+/-钢绞线方向，在平均插入尺寸z s.d.内，其中z
是一个命令行选项，默认为4）。如果我们重新运行前面的示例，但只绘制
随机抽取100对正态对，我们得到的相似结果要快3.6倍。
time samplot/src/samplot.py \
    -n NA12878 NA12889 NA12890 \
    -b samplot/test/data/NA12878_restricted.bam \
      samplot/test/data/NA12889_restricted.bam \
      samplot/test/data/NA12890_restricted.bam \
    -o 4_115928726_115931880.d100.png \
    -c chr4 \
    -s 115928726 \
    -e 115931880 \
    -t DEL \
    -d 100

real    0m2.621s
user    0m2.466s
sys     0m0.124s
基因和其他基因组特征注释
基因注释（tabixed，gff3文件）和基因组特征（tabixed，bgzipped，bed文件）可以是
包括在绘图中。
获取基因注释：
wget ftp://ftp.ensembl.org/pub/grch37/release-84/gff3/homo_sapiens/Homo_sapiens.GRCh37.82.gff3.gz
bedtools sort -i Homo_sapiens.GRCh37.82.gff3.gz \
| bgzip -c > Homo_sapiens.GRCh37.82.sort.gff3.gz
tabix Homo_sapiens.GRCh37.82.sort.gff3.gz
获取基因组注释，在本例中为重复掩码ER轨迹和可映射轨迹：
wget http://hgdownload.cse.ucsc.edu/goldenpath/hg19/encodeDCC/wgEncodeMapability/wgEncodeDukeMapabilityUniqueness35bp.bigWig
bigWigToBedGraph wgEncodeDukeMapabilityUniqueness35bp.bigWig wgEncodeDukeMapabilityUniqueness35bp.bed
bgzip wgEncodeDukeMapabilityUniqueness35bp.bed
tabix wgEncodeDukeMapabilityUniqueness35bp.bed.gz

curl http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/rmsk.txt.gz \
| bgzip -d -c \
| cut -f 6,7,8,13 \
| bedtools sort -i stdin \
| bgzip -c > rmsk.bed.gz
tabix rmsk.bed.gz
情节：
samplot/src/samplot.py \
    -n NA12878 NA12889 NA12890 \
    -b samplot/test/data/NA12878_restricted.bam \
      samplot/test/data/NA12889_restricted.bam \
      samplot/test/data/NA12890_restricted.bam \
    -o 4_115928726_115931880.d100.genes_reps_map.png \
    -c chr4 \
    -s 115928726 \
    -e 115931880 \
    -t DEL \
    -d 100 \
    -T Homo_sapiens.GRCh37.82.sort.gff3.gz \
    -A rmsk.bed.gz wgEncodeDukeMapabilityUniqueness35bp.bed.gz

real    0m2.784s
user    0m2.633s
sys 0m0.129s
从VCF文件生成图像
要在vcf文件中打印来自结构变量调用的图像，请使用samplot
samplot_vcf.py脚本。它接受VCF文件和样本的BAM文件
您希望打印、输出图像和网页索引以供查看。
用法
Usage: samplot.py [options]

Options:
  -h, --help            show this help message and exit
  --marker_size=MARKER_SIZE
                        Size of marks on pairs and splits (default 3)
  -n TITLES             Space-delimited list of plot titles. Use quote marks to include spaces (i.e. \"plot 1\" \"plot 2\")"
  -r REFERENCE          Reference file for CRAM
  -z Z                  Number of stdevs from the mean (default 4)
  -b BAMS               Bam file names (CSV)
  -o OUTPUT_FILE        Output file name
  -s START              Start range
  -e END                End range
  -c CHROM              Chromosome range
  -w WINDOW             Window size (count of bases to include), default(0.5 *
                        len)
  -d MAX_DEPTH          Max number of normal pairs to plot
  -t SV_TYPE            SV type
  -T TRANSCRIPT_FILE    GFF of transcripts
  -A ANNOTATION_FILE    Space-delimited list of bed.gz tabixed files of annotations (such as repeats, mappability, etc.)
  -a                    Print commandline arguments
  -H PLOT_HEIGHT        Plot height
  -W PLOT_WIDTH         Plot width
  -j                    Create only the json file, not the image plot
  --long_read=LONG_READ
                        Min length of a read to be a long-read (default 1000)
  --common_insert_size  Set common insert size for all plots
0
samplot_vcf.py可用于快速对变量应用一些基本过滤器。过滤器通过--filter参数应用，可以根据需要重复多次。使用--filter选项指定的每个表达式都以或的方式单独应用，其中&；字符可以在语句for和操作中使用。
示例：
Usage: samplot.py [options]

Options:
  -h, --help            show this help message and exit
  --marker_size=MARKER_SIZE
                        Size of marks on pairs and splits (default 3)
  -n TITLES             Space-delimited list of plot titles. Use quote marks to include spaces (i.e. \"plot 1\" \"plot 2\")"
  -r REFERENCE          Reference file for CRAM
  -z Z                  Number of stdevs from the mean (default 4)
  -b BAMS               Bam file names (CSV)
  -o OUTPUT_FILE        Output file name
  -s START              Start range
  -e END                End range
  -c CHROM              Chromosome range
  -w WINDOW             Window size (count of bases to include), default(0.5 *
                        len)
  -d MAX_DEPTH          Max number of normal pairs to plot
  -t SV_TYPE            SV type
  -T TRANSCRIPT_FILE    GFF of transcripts
  -A ANNOTATION_FILE    Space-delimited list of bed.gz tabixed files of annotations (such as repeats, mappability, etc.)
  -a                    Print commandline arguments
  -H PLOT_HEIGHT        Plot height
  -W PLOT_WIDTH         Plot width
  -j                    Create only the json file, not the image plot
  --long_read=LONG_READ
                        Min length of a read to be a long-read (default 1000)
  --common_insert_size  Set common insert size for all plots
1
本例将创建一个名为test的目录（在当前工作目录中）。将在该目录中创建名为index.html的文件。如果samplot.py脚本与samplot\u vcf.py脚本位于同一目录中，则将打印samplot命令，以便为通过上述过滤器的所有样本/变体创建绘图。
过滤器：上述过滤器将从输出中删除所有样本/变量，除了：
dup至少有8个su的变体
inv至少有5个susu的变体
VCF文件中可用的特定格式字段可能不同。我推荐SV VCF注释，其中包含dupfold，由brentp
对于更复杂的基于表达式的VCF筛选，请尝试brentp的slivar，它为筛选表达式提供了类似但更广泛的选项。
区域限制。变体也可以通过与一组区域重叠（例如，与疾病相关的基因的基因坐标）进行筛选。important_regions参数为本例提供此类区域的bed文件。
对从头开始的SVS进行过滤
使用aped带有samplot_vcf.py的文件可以过滤可能是自发的/从头开始的变体。这个过滤器是一个简单的孟德尔破坏测试。如果样本1）在PED文件中具有有效的父ID，2）具有非HOMREF基因型（VCF中为1/0、0/1或1/1），3）通过筛选，以及4）双亲均具有HOMREF基因型（VCF中为0/0），则样本可能具有从头变异。筛选器参数不应用于父级。样本与双亲一起绘制，双亲在图像中标记为父亲和母亲。
添加PED文件的示例调用：
Usage: samplot.py [options]

Options:
  -h, --help            show this help message and exit
  --marker_size=MARKER_SIZE
                        Size of marks on pairs and splits (default 3)
  -n TITLES             Space-delimited list of plot titles. Use quote marks to include spaces (i.e. \"plot 1\" \"plot 2\")"
  -r REFERENCE          Reference file for CRAM
  -z Z                  Number of stdevs from the mean (default 4)
  -b BAMS               Bam file names (CSV)
  -o OUTPUT_FILE        Output file name
  -s START              Start range
  -e END                End range
  -c CHROM              Chromosome range
  -w WINDOW             Window size (count of bases to include), default(0.5 *
                        len)
  -d MAX_DEPTH          Max number of normal pairs to plot
  -t SV_TYPE            SV type
  -T TRANSCRIPT_FILE    GFF of transcripts
  -A ANNOTATION_FILE    Space-delimited list of bed.gz tabixed files of annotations (such as repeats, mappability, etc.)
  -a                    Print commandline arguments
  -H PLOT_HEIGHT        Plot height
  -W PLOT_WIDTH         Plot width
  -j                    Create only the json file, not the image plot
  --long_read=LONG_READ
                        Min length of a read to be a long-read (default 1000)
  --common_insert_size  Set common insert size for all plots
2
附加说明。
默认情况下，只有不到95%的样本具有调用（无论是引用还是替换）的变量将被排除。这可以通过命令行参数min_call_rate
如果您主要对罕见的变体感兴趣，可以使用max-hets过滤器删除出现在max-hets示例中的变体。
现在，通过使用samplot.py'szoom参数，samplot可以很容易地绘制出大的变量。但是，您仍然可以使用max_mb参数选择只打印大于给定大小的变量。zoom参数接受一个整数参数，并且只在断点两边的该参数的+/-范围内显示间隔。多特ed line连接窗口顶部变量调用栏的两端，显示断点间隔之间的区域未显示。
默认情况下，如果少于6个样本有一个变体，并且给出了额外的HOMREF样本，则将从HOMREF组添加控制样本，以在图中达到总共6个样本。可以使用min_entries参数更改此数字。
在samplot.py中可选的参数可以作为samplot\u vcf.py的参数。它们将应用于生成的每个图像。
CRAM输入
samplot还支持cram输入，这需要一个引用fasta文件
如上所述阅读。请注意，引用文件不包括在
存储库大小。这次我们将在
电话：101055330-101067156。
Usage: samplot.py [options]

Options:
  -h, --help            show this help message and exit
  --marker_size=MARKER_SIZE
                        Size of marks on pairs and splits (default 3)
  -n TITLES             Space-delimited list of plot titles. Use quote marks to include spaces (i.e. \"plot 1\" \"plot 2\")"
  -r REFERENCE          Reference file for CRAM
  -z Z                  Number of stdevs from the mean (default 4)
  -b BAMS               Bam file names (CSV)
  -o OUTPUT_FILE        Output file name
  -s START              Start range
  -e END                End range
  -c CHROM              Chromosome range
  -w WINDOW             Window size (count of bases to include), default(0.5 *
                        len)
  -d MAX_DEPTH          Max number of normal pairs to plot
  -t SV_TYPE            SV type
  -T TRANSCRIPT_FILE    GFF of transcripts
  -A ANNOTATION_FILE    Space-delimited list of bed.gz tabixed files of annotations (such as repeats, mappability, etc.)
  -a                    Print commandline arguments
  -H PLOT_HEIGHT        Plot height
  -W PLOT_WIDTH         Plot width
  -j                    Create only the json file, not the image plot
  --long_read=LONG_READ
                        Min length of a read to be a long-read (default 1000)
  --common_insert_size  Set common insert size for all plots
3
上面使用的参数与基本用例中使用的参数相同，但添加了以下参数：
-r用于读取CRAM文件的参考文件
在没有sv的情况下绘图
samplot还可以绘制与sv无关的基因组区域。如果你这样做了
不通过SV类型选项（-t），则顶部SV栏将消失，并且仅
将显示由-c-s和-e给定的区域。
长读（牛津纳米孔和pacbio）和链接读支持
任何超过1000 bp的校准都被视为长读数，并且
地块设计将关注对齐的区域和间隙。对齐的区域是橙色的，间隔遵循用于短读的相同del/dup/inv颜色代码。对齐的高度取决于其最大间隙的大小。
如果BAM文件具有MI标记，则读取将被视为链接读取。
这些图将类似于短读图，但具有相同mi的所有路线将根据组中最大间隙的路线以相同高度绘制。绿线连接组中的所有路线。
标签：
文件
of
default
绘图
基因组
size
plot
结构
变异
欢迎加入QQ群-->： 979659372
                                    
推荐PyPI第三方库
odoo8-addon-hr-payroll-email-slip
添加通过电子邮件发送员工工资单的功能
odoo10-addon-l10n-it-website-sale-corrispettivi
意大利本地化-网站销售Corrispettivi
control
python控制系统库
easyirc
easy irc是一个irc工具包，用于开发irc客户端或bot，特别是对于python/irc初学者。
bitwarden-simple-cli
用python编写的用于列出和解密机密的简单bitwarden cli
brasil.vocab
python的巴西基本词汇
quicklock
一个简单的python资源锁，确保一次只有一个进程使用特定的资源运行。
Flask-PonyA
不费吹灰之力就为小型orm实体创建了api。
openlabs_payment_gatewa
用于支付网关/商户集成的Tryton模块
crontab
在python中解析和使用crontab调度
django-restricted-paths
仅通过使用特定视图响应或引发404来将对特定URL的访问限制为员工
generate-html
使用python生成器生成html5
mrpython
为python收集不需要自己库的东西。
odoo12-addon-quality-control-issue
允许管理和报告质量控制问题。
gitlab-attendant
一个整理和处理仓库的gitlab机器人

导 航 栏

                                            项目 描述
                                        

                                            版本历史
                                        

                                                下载文件
                                            
项目 链接
首页
                                    
标 签
许可证: BSD许可证（BSD 3条款）
作者信息:: 暂无
                                
                            
维护者

                                  jrbelyeu
                                
最新PyPI项目
italian_vip_says
UFx
vofs
fake_item_generator
NerEva
django-monologue
fio_product_attribute_strict
climailsystem
pyshape
tbb-devel
npy-append-arra
anthill.tal.macrorenderer
odoo11-addon-stock-a
uuuu
contextil
fyl_nester
appomatic_renderable
teacher
chuletas
slackbot_ce
最新Python常见问题
python语法错误（如果不在Z中，则在X中表示s）
Python语法错误（无效）概率
python语法错误*带有可选参数的args
python语法错误2.5版有什么办法解决吗？
Python语法错误2.7.4
python语法错误30/09/2013
Python语法错误E001
Python语法错误not（）op
python语法错误outpu
Python语法错误print len（）
python语法错误w3
Python语法错误不是caugh
python语法错误及yt-packag的使用
python语法错误可以查出来！！瓦里亚布
Python语法错误可能是缩进？

samplot 1.0.3

samplot的Python项目详细描述

用法

安装

示例：

基本用例

下采样"正常"对

基因和其他基因组特征注释

从VCF文件生成图像

用法

示例：

CRAM输入

在没有sv的情况下绘图
samplot还可以绘制与sv无关的基因组区域。如果你这样做了不通过SV类型选项（`-t`），则顶部SV栏将消失，并且仅将显示由`-c-s`和`-e`给定的区域。

长读（牛津纳米孔和pacbio）和链接读支持

推荐PyPI第三方库

odoo8-addon-hr-payroll-email-slip

odoo10-addon-l10n-it-website-sale-corrispettivi

control

easyirc

bitwarden-simple-cli

brasil.vocab

quicklock

Flask-PonyA

openlabs_payment_gatewa

crontab

django-restricted-paths

generate-html

mrpython

odoo12-addon-quality-control-issue

gitlab-attendant

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

samplot 1.0.3

samplot的Python项目详细描述

用法

安装

示例：

基本用例

下采样"正常"对

基因和其他基因组特征注释

从VCF文件生成图像

用法

示例：

CRAM输入

在没有sv的情况下绘图 samplot还可以绘制与sv无关的基因组区域。如果你这样做了 不通过SV类型选项（-t），则顶部SV栏将消失，并且仅 将显示由-c-s和-e给定的区域。

长读（牛津纳米孔和pacbio）和链接读支持

推荐PyPI第三方库

odoo8-addon-hr-payroll-email-slip

odoo10-addon-l10n-it-website-sale-corrispettivi

control

easyirc

bitwarden-simple-cli

brasil.vocab

quicklock

Flask-PonyA

openlabs_payment_gatewa

crontab

django-restricted-paths

generate-html

mrpython

odoo12-addon-quality-control-issue

gitlab-attendant

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

在没有sv的情况下绘图
samplot还可以绘制与sv无关的基因组区域。如果你这样做了不通过SV类型选项（`-t`），则顶部SV栏将消失，并且仅将显示由`-c-s`和`-e`给定的区域。

导航栏

项目链接

标签