Python scSplit包_程序模块 - PyPI

混合单细胞rna序列的无基因型解复用

scSplit的Python项目详细描述

分裂

< H3>混合单细胞RNA SEQ的基因型无解复用，使用隐藏状态模型来识别混合群体内的基因差异样本。

它已用于在10x数据集上解复用3到8个样本。

如何安装：

安装python 3.6+
确保可以导入以下python包：
数学、numpy、熊猫泡菜、pysam、pyvcf、scikit learn、scipy、统计学
“git clone https://github.com/jon xu/scSplit”或“pip install scSplit”
使用“<；PATH>；/scSplit<；command>；<；args>；”或“python<；PATH>；/scSplit<；command>；<；args>；”运行

总管道：

alt text

一。数据质量控制和过滤

a）用白名单条形码过滤原始BAM文件（用CB:Z:tag标记的条形码），以尽量减少技术噪音

b）以以下任何模式读取的方式过滤处理的bam：读取质量低于10、未映射段、二次对准、未通过滤波器、pcr或光学复制、或补充对准。

例如，samtools视图-s-b-q 10-f 3844已处理。bam>；已筛选。bam

c）使用samtools中的rmdup、sort、index命令将BAM文件标记为重复，并对其进行排序和索引

2.呼吁单核苷酸变异

a）在第一步处理后，使用freebayes v1.2从混合样本BAM文件中调用snv，设置freebayes的参数，以便不捕获插入和删除（indels）、多核苷酸多态性（MNP）或复杂事件，将最小等位基因计数设置为2，将最小基质量设置为1

例如freebayes-f<；reference.fa>；-iXu-C 2-q 1 filtered.bam>；snv.vcf

这一步骤可能需要很长时间（如果不使用并行处理，则长达30小时），gatk或其他snv调用工具也应该可以工作。用户还可以通过染色体分割bam，分别调用snv并合并vcf文件。

b）输出VCF文件应进一步过滤，以便仅保留质量分数大于30的SNV

三。建立等位基因计数矩阵

a）运行“scsplit count”并获取两个.csv文件（“ref_filtered.csv”和“alt_filtered.csv”）作为输出。输入参数：

    -v, --vcf, VCF from mixed BAM
    -i, --bam, mixed sample BAM        
    -b, --bar, barcodes whitelist
    -c, --com, Common SNVs    
    -r, --ref, Ref count CSV as output        
    -a, --alt, Alt count CSV as output

    e.g. scSplit count -v mixed_genotype.vcf -i filtered.bam -b barcodes.tsv -r ref_filtered.csv -a alt_filtered.csv

b）如果snv和/或细胞的数量很高，则此步骤可能会消耗内存作为指导，为60000个snv和10000个cells构建矩阵可能需要30gb以上的ram才能运行，请允许足够的ram资源来运行脚本。

c）强烈建议使用下面的snv列表筛选矩阵以提高预测精度：

  Common SNPs (e.g. Human common SNPs from 1000 Genome project)

  hg19: ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/

  hg38: http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data_collections/1000_genomes_project/release/20181203_biallelic_SNV/

  To process the genotype files of common SNPs, either download per-chromosome files and concatenate them using bcftools or download the whole genome file, take the first two columns of the vcf file and replace the tab with colon sign so that each line is one SNV, e.g., "1:10177". 

  Processed common SNVs for hg19 and hg38 can be found here: http://data.genomicsresearch.org/Projects/scSplit/CommonSNVs

请使用-c/--com参数在scSplit count中指定公共snv，请确保您的公共snv列表没有标题行

四。解复用并生成ALT P/A矩阵< EH3>
a）使用两个生成的等位基因计数矩阵文件将细胞分解成不同的样本doublet sample不会每次都有相同的sample id，这将在日志文件中明确指出
b）这一步也消耗内存，所需的ram高度依赖于上一步的snv数量和单元数量。作为指导，一个包含60000个snv和10000个单元格的矩阵可能需要50gb以上的ram才能运行，请允许足够的ram资源来运行脚本。
c）运行“scsplit run”，输入参数：
-r, --ref, Ref count CSV as input -a, --alt, Alt count CSV as input -n, --num, Number of mixed samples -v, --vcf, individual genotypes to check distinguishing variants against (optional) e.g. scSplit run -r ref_filtered.csv -a alt_filtered.csv -n 8
d）将生成以下文件：
"scSplit_barcodes_{n}.csv": barcodes assigned to each of the N+1 states (N specified samples and 1 doublet state) "scSplit_doublet.txt": indicating which cluster is doublet state "scSplit_dist_variants.txt": the distinguishing variants that can be used to genotype and assign sample to clusters "scSplit_dist_matrix.csv": the ALT allele Presence/Absence (P/A) matrix on distinguishing variants for all samples as a reference in assigning sample to clusters "scSplit_PA_matrix.csv": the full ALT allele Presence/Absence (P/A) matrix for all samples "scSplit.model", a python pickle dump containing the final allele fraction model (model.model_MAF), and the probability of each cell belonging to each sample (model.P_s_c) "scSplit.log" log file containing information for current run, iterations, and final Maximum Likelihood and doublet sample
5个（可选）根据分割结果生成样本基因型
a）使用输入参数运行“scSplit基因型”：
-r, --ref, Ref count CSV as output -a, --alt, Alt count CSV as output -p, --psc, generated P(S|C) e.g. scSplit genotype -r ref_filtered.csv -a alt_filtered.csv -p scSplit_P_s_c.csv
b）将为所有样本模型的对数转换基因型likelihood生成VCF文件（“scsplit.vcf”）。

标签：
文件
csv
the
序列
单细胞
filtered
rna
snv
基因型
scsplit
欢迎加入QQ群-->： 979659372
推荐PyPI第三方库
opsmodelviewer
二维opensees模型的查看器。
scidb-strm
用于scidb流的python库
Augmentor
用于机器学习的图像增强库
smev3Transform
xml电池的标准化和规范化包在CME 3签署。
zapi
python提供的一个简单的轻量级web框架。
smshub-org
示例python项目
py-front
前端的简单api包装器。
datacamprojects
创建健壮python项目课程的datacamp工具
wxPython-common
面向python的跨平台gui工具包
minefield
高性能python wsgi web服务器
confgen
confgen是一个小的命令实用程序，它可以帮助您生成一些配置，这要感谢jinja2模板
django-passbook
Django Passbook服务器应用程序
Flask-gzip
使用gzip压缩烧瓶应用程序中的响应。
django_chartit2
一个django应用程序，可以直接从模型绘制图表和透视图。使用highcharts和jquery javascript库在网页上呈现图表。
pygubu
Tkinter图形用户界面生成器。

导航栏
项目描述
版本历史
下载文件
项目链接
首页
标签
许可证: BSD许可证（BSD 3条款）
作者信息:: 暂无
维护者
jon-xu
最新PyPI项目
italian_vip_says
UFx
vofs
fake_item_generator
NerEva
django-monologue
fio_product_attribute_strict
climailsystem
pyshape
tbb-devel
npy-append-arra
anthill.tal.macrorenderer
odoo11-addon-stock-a
uuuu
contextil
fyl_nester
appomatic_renderable
teacher
chuletas
slackbot_ce
最新Python常见问题
使用登录请求.post导致“错误405不允许”
使用登录进行Python web抓取
使用登录进行抓取
使用登录页面从网站抓取数据
使用白色圆圈背景使图像更平滑
使用百分位数删除Pandas数据帧中的异常值
使用百分号进行Python字典操作
使用百分比delimi的Python字符串模板
使用百分比分割Numpy ndarray最有效的方法是什么？
使用百分比分配和修改变量（计算）
使用百分比单位绘制数据
使用百分比在单个采购订单中组合不同的订单类型
使用百分比将数据帧的子集与完整数据帧进行比较
使用百分比形式的BBOX选项，而不是绝对像素PyScreenShot Python
使用百分比登录列nam更新表

scSplit 0.8

scSplit的Python项目详细描述

分裂

它已用于在10x数据集上解复用3到8个样本。

如何安装：

总管道：

一。数据质量控制和过滤

2.呼吁单核苷酸变异

三。建立等位基因计数矩阵

5个（可选）根据分割结果生成样本基因型

推荐PyPI第三方库

opsmodelviewer

scidb-strm

Augmentor

smev3Transform

zapi

smshub-org

py-front

datacamprojects

wxPython-common

minefield

confgen

django-passbook

Flask-gzip

django_chartit2

pygubu

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签