Python cruzdb包_程序模块 - PyPI

与ucsc基因组数据库的接口。还允许向上/向下/k近邻查询和将表镜像到本地sqlite数据库

cruzdb的Python项目详细描述

文档的呈现版本位于：http://pythonhosted.org/cruzdb/

一篇描述cruzdb的论文发表在生物信息学上：http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btt534?ijkey=9I8rQeolKOhzFHv&keytype=ref

CRUZDB概述

ucscGenomes Database是一个很好的注释、规则资源。以及越来越多分类群的变异和各种数据。这个库的目的是使利用这些数据变得简单，这样我们就可以不借助awk-ful的复杂分析，容易出错操纵。作为动力，下面是一些功能的示例：

>>> from cruzdb import Genome

>>> g = Genome(db="hg18")

>>> muc5b = g.refGene.filter_by(name2="MUC5B").first()
>>> muc5b
refGene(chr11:MUC5B:1200870-1239982)

>>> muc5b.strand
'+'

# the first 4 introns
>>> muc5b.introns[:4]
[(1200999L, 1203486L), (1203543L, 1204010L), (1204082L, 1204420L), (1204682L, 1204836L)]

# the first 4 exons.
>>> muc5b.exons[:4]
[(1200870L, 1200999L), (1203486L, 1203543L), (1204010L, 1204082L), (1204420L, 1204682L)]

# note that some of these are not coding because they are < cdsStart
>>> muc5b.cdsStart
1200929L

# the extent of the 5' utr.
>>> muc5b.utr5
(1200870L, 1200929L)

# we can get the (first 4) actual CDS's with:
>>> muc5b.cds[:4]
[(1200929L, 1200999L), (1203486L, 1203543L), (1204010L, 1204082L), (1204420L, 1204682L)]

# the cds sequence from the UCSC DAS server as a list with one entry per cds
>>> muc5b.cds_sequence #doctest: +ELLIPSIS
['atgggtgccccgagcgcgtgccggacgctggtgttggctctggcggccatgctcgtggtgccgcaggcag', ...]


>>> transcript = g.knownGene.filter_by(name="uc001aaa.2").first()
>>> transcript.is_coding
False

# convert a genome coordinate to a local coordinate.
>>> transcript.localize(transcript.txStart)
0L

# or localize to the CDNA position.
>>> print transcript.localize(transcript.cdsStart, cdna=True)
None

命令行界面

安装cruzdb 0.5.4+后，给定一个文件input.bed您可以做到：

python -m cruzdb hg18 input.bed refGene cpgIslandExt

用refgene和cpgislandext注释间隔来自Versoinhg18的表格。

数据帧

…就这样了。我们可以从桌子上取一个：

>>> df = g.dataframe('cpgIslandExt')
>>> df.columns #doctest: +ELLIPSIS
Index([chrom, chromStart, chromEnd, name, length, cpgNum, gcNum, perCpg, perGc, obsExp], dtype=object)

通过将“refgene”更改为 “知道”。而且，对于一组基因来说，这很容易做到。

空间

k-最近邻、上游和下游搜索可用。上下搜索使用查询功能链确定方向：

>>> nearest = g.knearest("refGene", "chr1", 9444, 9555, k=6)
>>> up_list = g.upstream("refGene", "chr1", 9444, 9555, k=6)
>>> down_list = g.downstream("refGene", "chr1", 9444, 9555, k=6)

镜子

上面使用了ucsc的mysql接口。现在可以镜像了从ucsc到本地sqlite数据库的任何表，通过：

# cleanup

>>> import os
>>> if os.path.exists("/tmp/u.db"): os.unlink('/tmp/u.db')

>>> g = Genome('hg18')

>>> gs = g.mirror(['chromInfo'], 'sqlite:////tmp/u.db')

然后用作：

>>> gs.chromInfo
<class 'cruzdb.sqlsoup.chromInfo'>

代码

大多数每行特性都在功能类。如果您想添加一些特性（如现有的）功能。utr5）添加到这里。

表使用sqlalchemy进行反射，并映射到 _基因组的getattr方法

所以打个电话：

genome.knownGene

调用表arg设置为'knowngene'的'uuuu getattr'方法然后，该表被反射，并且具有feature的父类的对象返回sqlalchemy的声明性基。

贡献

是的，拜托！

开始编写代码时，最好自己获取一些 ucsc表，以便不重载ucsc服务器。您可以运行如下操作：

Genome('hg18').mirror(["refGene", "cpgIslandExt", "chromInfo", "knownGene", "kgXref"], "sqlite:////tmp/hg18.db")

然后连接将如下：

g = Genome("sqlite:////tmp/hg18.db")

如果您有喜欢使用/实现的功能，请在github上为讨论。下面是一些想法。

欢迎加入QQ群-->： 979659372

cruzdb 0.5.6

cruzdb的Python项目详细描述

CRUZDB概述

命令行界面

数据帧

空间

镜子

代码

贡献

推荐PyPI第三方库

upymenu

stweet

openwrt-ubus

odoo10-addon-product-secondary-unit

arps

adafruit-circuitpython-icm20x

pandasql3

altdeutsch

cloudwatch

sceptre-git-clone-hook

tpRigToolkit-libs-controlrig

topsis-3283

cromlech.marshallers

Autogit

jinja2-strcase

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

cruzdb 0.5.6

cruzdb的Python项目详细描述

CRUZDB概述

命令行界面

数据帧

空间

镜子

代码

贡献

推荐PyPI第三方库

upymenu

stweet

openwrt-ubus

odoo10-addon-product-secondary-unit

arps

adafruit-circuitpython-icm20x

pandasql3

altdeutsch

cloudwatch

sceptre-git-clone-hook

tpRigToolkit-libs-controlrig

topsis-3283

cromlech.marshallers

Autogit

jinja2-strcase

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签