Python pyfastx包_程序模块 - PyPI

pyfastx是一个python模块，用于快速随机访问来自普通和gzip fast a文件的序列

pyfastx的Python项目详细描述

一个健壮的python模块，用于快速随机访问来自普通和gzip fast a文件的序列

About
Installation
Usage
Testing
Acknowledgements

About

pyfastx是一个轻量级的python c扩展，它允许用户随机访问来自普通文件和gzippedfasta文件的序列。该模块旨在为用户提供简单的api，以便通过标识符和索引号从fasta中提取sequence。pyfastx将构建存储在sqlite3数据库文件中的索引，以便进行随机访问，以避免消耗过多的内存。此外，pyfastx还可以解析标准（序列，这些序列分布在多个长度相同的行）和非标准（不同长度的行）fasta格式。此模块使用klib项目中@attractivechaos编写的kseq.h解析普通fasta文件，使用indexed_gzip项目中@pauldmccarthy编写的zran.c索引gzip文件以进行随机访问。

这个项目的灵感来自@mdshw5的项目pyfaidx和@brentp的项目pyfasta。

Installation

在启动之前，请确保同时拥有pip和至少3.5版python。

您可以通过python包索引（pypi）安装pyfastx。

pip install pyfastx

更新pyfastx模块

pip install -U pyfastx

Usage

Read FASTA file

解析平面或gzip fasta文件而不构建索引的最快方法。

>>>importpyfastx>>>forname,seqinpyfastx.Fasta('test/data/test.fa.gz',build_index=False):>>>print(name,seq)

读取flat或gzip格式的fasta文件并建立索引，支持对fasta的随机访问。

>>>importpyfastx>>>fa=pyfastx.Fasta('test/data/test.fa.gz')>>>fa<Fasta>test/data/test.fa.gzcontains211seqs

注意

注：建筑指数可能需要一些时间。建立索引所需的时间取决于fasta文件的大小。如果建立了索引，则可以随机访问fasta文件中的任何序列。

Get FASTA information

>>># get sequence counts in FASTA>>>len(fa)211>>># get total sequence length of FASTA>>>fa.size86262>>># get GC content of DNA sequence of FASTA>>>fa.gc_content43.529014587402344>>># get composition of nucleotides in FASTA>>>fa.composition{'A':24534,'C':18694,'G':18855,'T':24179,'N':0}

Get sequence from FASTA

>>># get sequence like a dictionary by identifier>>>s1=fa['JZ822577.1']>>>s1<Sequence>JZ822577.1withlengthof333>>># get sequence like a list by index>>>s2=fa[2]>>>s2<Sequence>JZ822579.1withlengthof176>>># get last sequence>>>s3=fa[-1]>>>s3<Sequence>JZ840318.1withlengthof134>>># check a sequence name weather in FASTA file>>>'JZ822577.1'infaTrue

Get sequence information

>>>s=fa[-1]>>>s<Sequence>JZ840318.1withlengthof134>>># get sequence name>>>s.name'JZ840318.1'>>># get sequence string>>>s.seq'ACTGGAGGTTCTTCTTCCTGTGGAAAGTAACTTGTTTTGCCTTCACCTGCCTGTTCTTCACATCAACCTTGTTCCCACACAAAACAATGGGAATGTTCTCACACACCCTGCAGAGATCACGATGCCATGTTGGT'>>># get sequence length>>>len(s)134>>># get GC content if dna sequence>>>s.gc_content46.26865768432617>>># get nucleotide composition if dna sequence>>>s.composition{'A':31,'C':37,'G':25,'T':41,'N':0}

Sequence slice

序列对象可以像python字符串一样进行切片

>>># get a sub seq from sequence>>>ss=seq[10:30]>>>ss<Sequence>JZ840318.1from11to30>>>ss.name'JZ840318.1:11-30'>>>ss.seq'CTTCTTCCTGTGGAAAGTAA'>>>ss=s[-10:]>>>ss<Sequence>JZ840318.1from125to134>>>ss.name'JZ840318.1:125-134'>>>ss.seq'CCATGTTGGT'

注意

注意：切片开始和结束坐标是基于0的。目前，pyfastx不支持可选的第三个step或stride参数。例如ss[::-1]

Reverse and complement sequence

>>># get sliced sequence>>>fa[0][10:20].seq'GTCAATTTCC'>>># get reverse of sliced sequence>>>fa[0][10:20].reverse'CCTTTAACTG'>>># get complement of sliced sequence>>>fa[0][10:20].complement'CAGTTAAAGG'>>># get reversed complement sequence, corresponding to sequence in antisense strand>>>fa[0][10:20].antisense'GGAAATTGAC'

Get subsequences

使用[开始，结束]坐标列表可以从fasta文件中检索子菜单项

>>># get subsequence with start and end position>>>interval=(1,10)>>>fa.fetch('JZ822577.1',interval)'CTCTAGAGAT'>>># get subsequences with a list of start and end position>>>intervals=[(1,10),(50,60)]>>>fa.fetch('JZ822577.1',intervals)'CTCTAGAGATTTTAGTTTGAC'>>># get subsequences with reverse strand>>>fa.fetch('JZ822577.1',(1,10),strand='-')'ATCTCTAGAG'

Get identifiers

将序列的所有标识符获取为类似列表的对象。

>>>ids=fa.keys()>>>ids<Identifier>contains211identifiers>>># get count of sequence>>>len(ids)211>>># get identifier by index>>>ids[0]'JZ822577.1'>>># check identifier where in fasta>>>'JZ822577.1'inidsTrue>>># iter identifiers>>>fornameinids:>>>print(name)>>># convert to a list>>>list(ids)

Testing

pyfaidx模块用于测试pyfastx。运行测试：

$ python setup.py test

Acknowledgements

kseq.h和zlib用于解析fasta格式。Sqlite3用于存储生成的索引。pyfastx可以随机访问来自gzip fasta文件的序列，该文件主要归因于indexed_gzip。

欢迎加入QQ群-->： 979659372

pyfastx 0.2.10

pyfastx的Python项目详细描述

About

Installation

Usage

Read FASTA file

Get FASTA information

Get sequence from FASTA

Get sequence information

Sequence slice

Reverse and complement sequence

Get subsequences

Get identifiers

Testing

Acknowledgements

推荐PyPI第三方库

helga-spongebob

supervisorwildcards

upt-fedora

cudamedfilt2d

azureclistorage

boatmacro

vexbot

zish_antlr

redo

django-politico-token-service

clinodes

ems-gcp-toolkit

unbabel-p

git-rename-authors

Willow

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

pyfastx 0.2.10

pyfastx的Python项目详细描述

推荐PyPI第三方库

helga-spongebob

supervisorwildcards

upt-fedora

cudamedfilt2d

azureclistorage

boatmacro

vexbot

zish_antlr

redo

django-politico-token-service

clinodes

ems-gcp-toolkit

unbabel-p

git-rename-authors

Willow

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签