Python pysumstats包_程序模块 - PyPI

用于使用GWAS摘要统计信息的包

pysumstats的Python项目详细描述

修补程序注释

2020年11月30日（v0.4.6）

修正：使用SumStats列表合并现在可以正确地传播参数
修正：元分析现在正确地对所有汇总统计数据进行汇总
修复：修复了一个会阻止正确的外部合并的问题

旧的patchnode可以在PATCHNOTES.md中找到

说明

一个python包，用于在python中处理GWAS摘要统计数据。
此软件包旨在使阅读摘要统计信息、执行质量控制、合并汇总统计数据和执行元分析变得容易。
Meta分析可以用.meta()和逆方差加权或样本加权方法进行。
在Baselmans, et al. (2019)中描述的GWAMA可以使用合并摘要统计中的.gwama()函数来执行。
绘图包使用matplotlib.pyplot用于生成图形，因此函数通常与matplotlib.pyplot颜色、图形和轴对象。
警告：在启用低内存的情况下合并仍然是高度实验性的。

参考文献

在出版物中使用pysumstats包，或者类似的东西？这是awesome！
此程序包没有附加出版物，我不会强迫任何人引用我，也不会让我成为一个合著者或者其他什么，我希望这篇文章能保持容易理解。但是如果你能给这个github添加一个链接，或者在致谢信中引用它或者诸如此类的东西，我将不胜感激。
如果您有任何问题，想帮助添加方法或想让我知道您正计划发布此内容，您可以通过pypi website of this project联系。
如果使用.gwama()方法，请参考原始出版物：Baselmans, et al. (2019)。在

安装

这个包是为python3.7设计的。直接从此github克隆包，或使用

pip3 install --upgrade pysumstats

使用

import pysumstats as sumstats

读取文件

s1 = sumstats.SumStats("sumstats1.csv.gz", phenotype='GWASsummary1')

读取没有样本大小列的数据：您必须手动指定gwas sample size

s2 = sumstats.SumStats("sumstats2.txt.gz", phenotype='GWASsummary2', gwas_n=350492)

读取列名无法自动识别的数据：

s3 = sumstats.SumStats("sumstats3.csv", phenotype='GWASsummary3',
                              column_names={
                                    'rsid': 'weird_name_for_rsid',
                                    'chr': 'weird_name_for_chr',
                                    'bp': 'weird_name_for_bp',
                                    'ea': 'weird_name_for_ea',
                                    'oa': 'weird_name_for_oa',
                                    'maf': 'weird_name_for_maf',
                                    'b': 'weird_name_for_b',
                                    'se': 'weird_name_for_se',
                                    'p': 'weird_name_for_p',
                                    'hwe': 'weird_name_for_p_hwe',
                                    'info': 'weird_name_for_info',
                                    'n': 'weird_name_for_n',
                                    'eaf': 'weird_name_for_eaf',
                                    'oaf': 'weird_name_for_oaf'})

执行质量控制

^{pr2}$

合并sumstats，low\u memory选项仍然是实验性的，因此请小心使用它

merge1 = s1.merge(s2)

荟萃分析

n_weighted_meta = merge1.meta_analyze(name='meta1', method='samplesize')  # N-weighted meta analysis
ivw_meta = merge1.meta_analyze(name='meta1', method='ivw')  # Standard inverse-variance weighted meta analysis
gwama = merge1.gwama(name='meta1', method='ivw')  # GWAMA as described in Baselmans, et al. (2019)

另外还支持将SNP遗传率作为权重加入

exc_meta = exc.gwama(h2_snp={'ntr_exc': .01, 'ukb_ssoe': .02}, name='exc', method='ivw')

和您自己的协方差矩阵（在大多数R脚本中称为cov_Z）

# Either read it from a file:
import pandas as pd
cov_z = pd.read_csv('my_cov_z.csv') # Note it should be pandas dataframe with column names and index names equal to your phenotypes

# Or generate it from a phenotype file yourself:
phenotypes = pd.read_csv('my_phenotype_file.csv')
cov_z = sumstats.cov_matrix_from_phenotype_file(phenotypes, phenotypes=['GWASsummary1', 'GWASsummary2'])

gwama = exc.gwama(cov_matrix=cov_z, h2_snp={'GWASsummary1': .01, 'GWASsummary2': .02}, name='meta1', method='ivw')

见结果摘要

gwama.describe()

见数据标题
`gwama.head()`

见所有染色体的头部
`gwama.head(n_chromosomes=23)`

QQ和曼哈顿地块的结果

gwama.manhattan(filename='meta_manhattan.png')
gwama.qqplot(filename='meta_qq.png')

将结果另存为csv

exc.save('exc_sumstats.csv')

将结果保存为pickle文件（保存和加载回Python的速度更快）

exc.save('exc_sumstats.pickle')

将gwama结果与另一个文件合并：

merged = gwama.merge(s3)

将准备好的MR分析文件保存到R:

merged.prep_for_mr(exposure='GWASsummary3', outcome='meta1',
                   filename=['GWAS3-Meta.csv', 'Meta-GWAS3.csv'],
                   p_cutoff=5e-8, bidirectional=True, index=False)

根据R中MendelianRandomization包的规范，生成的文件将具有以下列名：

rsid chr bp exposure.A1 exposure.A2 outcome.A1 outcome.A2 exposure.se exposure.b outcome.se outcome.b

一些其他的东西：

# See column names of the file
gpc_neuro.columns

# SumStats support for standard indexing is growing:
exc[0]  # Get the full output of the first SNP
exc[:10]  # Get the full output of the first 10 SNPs
exc[:10, 'p']  # Get the p value of the first 10 SNPs
exc['p']  # Get the p values of all SNPs
exc['rs78948828']  # Get the full output of 1 specific rsid
exc[['rs78948828', 'rs6057089', 'rs55957973']]  # Get the full output of multiple specific rsids
exc[['rs78948828', 'rs6057089', 'rs55957973'], 'p']  # Get the p-value for specific rsids

# If for whatever reason you want to do stuff with each SNP individually you can also loop over the entire file
for snp_output in exc:
    if exc['p'] < 5e-8:
        print('Yay significant SNP!')
    # do something


# If you only want to loop over some specific columns, you can
for rsid, b, se, p in exc[['rsid', 'b', 'se', 'p']].values:
    if p < 5e-8:
        print('Yay significant SNP!')

欢迎加入QQ群-->： 979659372

pysumstats 0.4.6

pysumstats的Python项目详细描述

修补程序注释

2020年11月30日（v0.4.6）

上一页

说明

参考文献

安装

使用

读取文件

读取没有样本大小列的数据：您必须手动指定gwas sample size

读取列名无法自动识别的数据：

执行质量控制

合并sumstats，low\u memory选项仍然是实验性的，因此请小心使用它

另外还支持将SNP遗传率作为权重加入

和您自己的协方差矩阵（在大多数R脚本中称为cov_Z）

见结果摘要

见数据标题 gwama.head()

见所有染色体的头部 gwama.head(n_chromosomes=23)

QQ和曼哈顿地块的结果

将结果另存为csv

将结果保存为pickle文件（保存和加载回Python的速度更快）

将gwama结果与另一个文件合并：

将准备好的MR分析文件保存到R:

推荐PyPI第三方库

django-nomad-country-blogs

python-slimtimer

wildcard.recipe.insertinto

modernp

bananaplots

pyconcrete

odoo9-addon-stock-quant-manual-assign

tedana

inqbus.zopeftp

collective.splashdancing

genesis

opentargets-checkomatic

pyRen

example-pkg-llz-arithmetic

cellp

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

见数据标题
`gwama.head()`

见所有染色体的头部
`gwama.head(n_chromosomes=23)`

导航栏

项目链接

标签