未提供项目说明
cblaster的Python项目详细描述
cblaster公司
cblaster
是一种寻找同位同源序列簇的工具
在爆炸搜索中。在
大纲
- 执行BLAST搜索,远程(通过BLAST API)或本地(通过
diamond
) - 分析结果,保存符合用户定义的标识、覆盖率和 e值
- 查询NCBI的相同蛋白质组(IPG)资源以获取每次命中的位置 它们各自的基因组支架
- 寻找符合基因间距离阈值和 最小保守序列数
安装
cblaster
可以通过pip安装:
$ pip3 install cblaster --user
或者通过克隆存储库并安装:
^{pr2}$依赖关系
cblaster
是在python3.6上测试的,它唯一的外部Python依赖关系是
requests
模块(用于与ncbiapi交互)。
如果要执行本地搜索,则应安装并提供diamond
在系统$PATH上。
cblaster
如果启动了本地搜索但找不到它,它将抛出一个错误
diamond
或{
使用
cblaster
接受FASTA文件和有效NCBI序列标识符的集合
(地理信息系统,登记号)作为输入。
远程搜索可以简单地执行:
$ cblaster search --query_file query.fasta
例如,远程搜索 burnettramic acids gene cluster, bua ,根据NCBI的nr数据库:
$ cblaster search -qf bua.fasta [12:14:17] INFO - Starting cblaster in remote mode [12:14:17] INFO - Launching new search [12:14:19] INFO - Request Identifier (RID): WHS0UGYJ015 [12:14:19] INFO - Request Time Of Execution (RTOE): 25s [12:14:44] INFO - Polling NCBI for completion status [12:14:44] INFO - Checking search status... [12:15:44] INFO - Checking search status... [12:16:44] INFO - Checking search status... [12:16:46] INFO - Search has completed successfully! [12:16:46] INFO - Retrieving results for search WHS0UGYJ015 [12:16:51] INFO - Parsing results... [12:16:51] INFO - Found 3944 hits meeting score thresholds [12:16:51] INFO - Fetching genomic context of hits [12:17:14] INFO - Searching for clustered hits across 705 organisms [12:17:14] INFO - Writing summary to <stdout> Aspergillus mulundensis DSM 5745================================ NW_020797889.1 -------------- Query Subject Identity Coverage E-value Bitscore Start End Strand QBE85641.1 XP_026607259.1 75.56 99.5918 074217178811719409 - QBE85642.1 XP_026607260.1 89.916 100066717196501720797 + QBE85643.1 XP_026607261.1 89.532 83.1169 083217214941722934 + QBE85644.1 XP_026607262.1 64.829 98.9218 6.51e-157 45517232521724467 - QBE85645.1 XP_026607263.1 69.97 1006.93e-157 44917251131726277 - QBE85646.1 XP_026607264.1 82.759 96.8447 067017268921728302 + QBE85647.1 XP_026607265.1 72.674 99.2048 076417297351731338 + QBE85648.1 XP_026607266.1 56.098 98.324 4.24e-64 20517317011732402 - QBE85649.1 XP_026607267.1 79.623 99.8746 0657317328201745289 + ...
可以使用--binary
参数生成查询序列缺失/存在矩阵:
Organism Scaffold Start End QBE85641.1 QBE85642.1 QBE85643.1 QBE85644.1 QBE85645.1 QBE85646.1 QBE85647.1 QBE85648.1 QBE85649.1
Aspergillus mulundensis DSM 5745 NW_020797889.1 1717881 1745289 1 1 1 1 1 1 1 1 1
Aspergillus versicolor CBS 583.65 KV878126.1 3162095 3187090 1 1 1 0 1 1 1 1 1
Pseudomassariella vexata CBS 129021 MCFJ01000004.1 1606356 1628483 1 1 1 0 0 1 0 1 1
Hypoxylon sp. CO27-5 KZ112517.1 92119 112957 1 1 1 0 0 0 1 0 1
Hypoxylon sp. EC38 KZ111255.1 514739 535366 1 1 1 0 0 0 1 0 1
Epicoccum nigrum ICMP 19927 KZ107839.1 2116719 2142558 1 1 0 0 0 1 1 0 1
Aureobasidium subglaciale EXF-2481 NW_013566983.1 700476 718693 1 1 0 0 0 1 1 0 0
Aureobasidium pullulans EXF-6514 QZBF01000009.1 18721 34295 1 1 0 0 0 1 1 0 0
Aureobasidium pullulans EXF-5628 QZBI01000512.1 329 13401 1 0 0 0 0 1 1 0 0
cblaster
还可以生成二进制文件的完全交互式可视化
表。要查看示例,请单击here。在
有关更多使用示例和API文档,请参阅 documentation。在
引文
如果您发现此工具有用,请引用:
1. <pending>
2. Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
3. Acland, A. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 42, 7–17 (2014).
- 项目
标签: