Python SNPmatch包_程序模块 - PyPI

一个简单的python库，用于识别给定样本snp的最有可能的应变

SNPmatch的Python项目详细描述

#snpmatch

snpmatch是一个python工具包，它可以用来对数据库行中低至4000个标记的样本进行基因分型。snpmatch可以使用简单的似然方法高效、经济地对样本进行基因分型。

##安装和使用

以下步骤涉及在本地计算机上运行snpmatch。这个包只在python 2中测试。

###使用PIP安装

借助pip，snpmatch可以轻松安装。snpmatch使用各种python包（numpy、pandas、[pygwas]（https://github.com/timeu/PyGWAS）、[scikit allel]（https://github.com/cggh/scikit-allel）），这些包在使用pip时会自动下载和安装。按照下面的命令成功安装。

`bash ## installing SNPmatch from git hub repository pip install git+https://github.com/Gregor-Mendel-Institute/SNPmatch.git ## or PyPi pip install SNPmatch ` snpmatch可以从git repo安装，也可以通过pypi安装。如果出现安装错误，请使用下面的命令安装这些依赖项（对于基于debian的系统）。 `bash sudo apt-get install python-devlibfreetype6-devlibxft-devlibblas-devliblapack-devlibatlas-base-devlibhdf5-dev gfortran sudo pip install NumPy ` Mac用户可以使用[Homebrew]（https://brew.sh/）安装这些软件包。这些软件包应该足以正确安装snpmatch。如果安装仍然有问题，请在github repo中提出问题。

###数据库文件

包含许多菌株已知基因型信息的数据库文件必须作为hdf5格式的文件提供。可以使用VCF文件中存在的给定标记或变体生成。可以使用snpmatch中给出的函数生成数据库文件。它们是使用下面给出的命令生成的。

以下命令需要在路径环境中执行bcftools。使用pygwas包读取数据库文件。所以vcf文件现在只需要有双等位snp。

`bash snpmatch makedb -i input_database.vcf -o db `

上述命令生成三个文件，

db.csv
分贝HDF5
数据库符合HDF5

两个hdf5文件是用于进一步分析的主要数据库文件。这些文件具有相同的信息，但为了提高效率而进行了分块。文件db.hdf5和db.acc.hdf5分别在-d和-e选项下被赋予snpmatch命令。

对于arabidopsis thaliana用户，我们为regmap和1001genomes面板提供了snp数据库文件，可以[在这里]下载（https://gmioncloud-my.sharepoint.com/personal/uemit_seren_gmi_oeaw_ac_at/_layouts/15/guestaccess.aspx?folderid=0ca806e676c154094992a9e89e5341d43&authkey=AXJPl6GkD8vNPDZJwheb6uk）。

###输入文件

作为输入文件，snpmatch以两种文件格式（bed和vcf）获取基因型信息。示例输入文件位于文件夹[示例文件]（https://github.com/Gregor-Mendel-Institute/SNPmatch/tree/master/sample_files）中。简而言之，床文件应该是三个标签分开的列，其染色体、位置和基因型如下所示。

` 1 125 0/0 1 284 0/0 1 336 0/0 1 346 1/1 1 353 0/0 1 363 0/0 1 465 0/0 1 471 0/1 1 540 0/0 1 564 0/0 1 597 0/0 1 612 1/1 1 617 0/1 ` [链接]（http://gatkforums.broadinstitute.org/gatk/discussion/1268/what-is-a-vcf-and-how-should-i-interpret-it）中默认格式的VCF文件。snpmatch需要的主要参数是header中的chrom和pos以及info列中的gt。pl（可能基因型的标准化phred-scaled-likelihood），如果存在，则提高snpmatch的效率。

###用法

snpmatch可以作为下面给出的bash命令运行。使用-h的每个命令的详细手册。

`bash snpmatch inbred -i input_file -d db.hdf5 -e db.acc.hdf5 -o output_file # or snpmatch parser -i input_file -o input_npz snpmatch inbred -i input_npz -d db.hdf5 -e db.acc.hdf5 -o output_file `

###阿拉根诺

snpmatch可以作为一个网络工具直接为a.thaliana研究人员运行，[arageno]（http://arageno.gmi.oeaw.ac.at）

##对杂交种进行基因分型

当数据库中存在亲本菌株时，snpmatch可用于识别杂交个体。对于这样的个体，snpmatch可以在整个基因组的windows中运行。下面给出了用于运行的命令

`bash snpmatch cross -d db.hdf5 -e db.acc.hdf5 -i input_file -b window_size_in_bp -o output_file #to get a genetic map for the hybrid snpmatch genotype_cross -e db.acc.hdf5 -p parent1xparent2 -i input_file -o output_file # or if parents have VCF files individually snpmatch genotype_cross -p parent1.vcf -q parent2.vcf -i input_file -o output_file `

这些脚本是基于{em1}$a.thaliana基因组大小实现的。但是csmatch[script]（https://github.com/Gregor-Mendel-Institute/SNPmatch/blob/master/snpmatch/core/csmatch.py#L19）中的全局变量可以修改为相应的基因组大小。

##贡献一。叉开！ 2.创建您的功能分支：git checkout-b我的新功能三。提交您的更改：git commit-am'add some feature' 四。推到分支：git push origin my new feature 5个。提交拉取请求：d

##历史记录

1.9.2：稳定版，2017年8月24日
2.0.0：稳定版本，26-01-2018
2.1.0：稳定版，2018年8月9日

##学分

rahul pisupati（rahul.pisupati[at]gmi.oeaw.ac.at）
mit seren（uemit.seren[地址]gmi.oeaw.ac.at）

##引文

Pisupati，R.等人。。利用snpmatch（一种对复杂样品进行基因分型的工具）验证{em1}$拟南芥的库存。自然科学数据4，170184（2017）。 [内政部：10.1038/sdata.2017.184]（https://www.nature.com/articles/sdata2017184）

欢迎加入QQ群-->： 979659372

SNPmatch 2.2.0

SNPmatch的Python项目详细描述

推荐PyPI第三方库

raspberrysystem

argopen

satzreduktion

ansible-playbook-grapher

shl

tinman

guide-search

django-gpxp

check-tier

guillotina-mailer

itkdb

frankly-python

PyDatcom

pyhdfview

ipynb-tests

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

SNPmatch 2.2.0

SNPmatch的Python项目详细描述

推荐PyPI第三方库

raspberrysystem

argopen

satzreduktion

ansible-playbook-grapher

shl

tinman

guide-search

django-gpxp

check-tier

guillotina-mailer

itkdb

frankly-python

PyDatcom

pyhdfview

ipynb-tests

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签