将plink文件读入pandas数据帧

pandas-plink的Python项目详细描述


熊猫叮当声

TravisAppVeyorDocumentation

pandas plink是一个python包,用于读取PLINK binary file format和(从2.0.0版开始)plink和gcta实现的关系矩阵。 文件读取是通过lazy loading进行的,这意味着它通过实际读取用户实际访问的基因型来节省内存。

显著的变化可以在CHANGELOG.md找到。

安装

使用pip

安装
pip install pandas-plink

或者可以通过conda

conda install -c conda-forge pandas-plink

用法

它非常简单

>>>frompandas_plinkimportread_plink1_bin>>>G=read_plink1_bin("chr11.bed","chr11.bim","chr11.fam",verbose=False)>>>print(G)<xarray.DataArray'genotype'(sample:14,variant:779)>dask.array<shape=(14,779),dtype=float64,chunksize=(14,779)>Coordinates:*sample(sample)object'B001''B002''B003'...'B012''B013''B014'*variant(variant)object'11_316849996''11_316874359'...'11_345698259'father(sample)<U1'0''0''0''0''0''0'...'0''0''0''0''0''0'fid(sample)<U4'B001''B002''B003''B004'...'B012''B013''B014'gender(sample)<U1'0''0''0''0''0''0'...'0''0''0''0''0''0'i(sample)int64012345678910111213iid(sample)<U4'B001''B002''B003''B004'...'B012''B013''B014'mother(sample)<U1'0''0''0''0''0''0'...'0''0''0''0''0''0'trait(sample)<U2'-9''-9''-9''-9''-9'...'-9''-9''-9''-9''-9'a0(variant)<U1'C''G''G''C''C''T'...'T''A''C''A''A''T'a1(variant)<U1'T''C''C''T''T''A'...'C''G''T''G''C''C'chrom(variant)<U2'11''11''11''11''11'...'11''11''11''11''11'cm(variant)float640.00.00.00.00.00.0...0.00.00.00.00.0pos(variant)int64157439181802248969...289373752896109129005702snp(variant)<U9'316849996''316874359'...'345653648''345698259'>>>print(G.sel(sample="B003",variant="11_316874359").values)0.0>>>print(G.a0.sel(variant="11_316874359").values)G>>>print(G.sel(sample="B003",variant="11_316941526").values)2.0>>>print(G.a1.sel(variant="11_316941526").values)C

当用户访问时,基因型的一部分将被读取。

协方差矩阵也可以很容易地读取。 示例:

>>>frompandas_plinkimportread_rel>>>K=read_rel("plink2.rel.bin")>>>print(K)<xarray.DataArray(sample_0:10,sample_1:10)>array([[0.885782,0.233846,-0.186339,-0.009789,-0.138897,0.287779,0.269977,-0.231279,-0.095472,-0.213979],[0.233846,1.077493,-0.452858,0.192877,-0.186027,0.171027,0.406056,-0.013149,-0.131477,-0.134314],[-0.186339,-0.452858,1.183312,-0.040948,-0.146034,-0.204510,-0.314808,-0.042503,0.296828,-0.011661],[-0.009789,0.192877,-0.040948,0.895360,-0.068605,0.012023,0.057827,-0.192152,-0.089094,0.174269],[-0.138897,-0.186027,-0.146034,-0.068605,1.183237,0.085104,-0.032974,0.103608,0.215769,0.166648],[0.287779,0.171027,-0.204510,0.012023,0.085104,0.956921,0.065427,-0.043752,-0.091492,-0.227673],[0.269977,0.406056,-0.314808,0.057827,-0.032974,0.065427,0.714746,-0.101254,-0.088171,-0.063964],[-0.231279,-0.013149,-0.042503,-0.192152,0.103608,-0.043752,-0.101254,1.423033,-0.298255,-0.074334],[-0.095472,-0.131477,0.296828,-0.089094,0.215769,-0.091492,-0.088171,-0.298255,0.910274,-0.024663],[-0.213979,-0.134314,-0.011661,0.174269,0.166648,-0.227673,-0.063964,-0.074334,-0.024663,0.914586]])Coordinates:*sample_0(sample_0)object'HG00419''HG00650'...'NA20508''NA20753'*sample_1(sample_1)object'HG00419''HG00650'...'NA20508''NA20753'fid(sample_1)object'HG00419''HG00650'...'NA20508''NA20753'iid(sample_1)object'HG00419''HG00650'...'NA20508''NA20753'>>>print(K.values)[[0.890.23-0.19-0.01-0.140.290.27-0.23-0.10-0.21][0.231.08-0.450.19-0.190.170.41-0.01-0.13-0.13][-0.19-0.451.18-0.04-0.15-0.20-0.31-0.040.30-0.01][-0.010.19-0.040.90-0.070.010.06-0.19-0.090.17][-0.14-0.19-0.15-0.071.180.09-0.030.100.220.17][0.290.17-0.200.010.090.960.07-0.04-0.09-0.23][0.270.41-0.310.06-0.030.070.71-0.10-0.09-0.06][-0.23-0.01-0.04-0.190.10-0.04-0.101.42-0.30-0.07][-0.10-0.130.30-0.090.22-0.09-0.09-0.300.91-0.02][-0.21-0.13-0.010.170.17-0.23-0.06-0.07-0.020.91]]

请参阅pandas-plink documentation了解更多信息。

作者

许可证

这个项目是根据MIT License授权的。

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java JTable无法向新创建的列添加值   java如何调整JEditorPane中编辑区域的大小?   Java通过反射确定未知数组中的数组大小   java Intellij Idea有时无法按其预期的方式构建应用程序   java Swing GUI带有IntelliJ错误“contentPane不能设置为null”从终端编译时   如何将这些通用方法调用从C#转换为Java   在null上找不到java属性或字段“index”   从Java HashMap获取整数值时是否需要调用intValue()方法?   java Android谷歌地图获取相机中的图像块   unix无法捕获JAVA中“who m”命令的输出   java,同时将邮件发送到“收件人”标题“我”中的多个收件人   在java中向链表添加未知数量的节点   无法为Heroku上的discord bot设置java端口   java使用Apache HttpClient进行选项请求   与元素类型“ApplicationName”关联的属性“Application Version”需要java Open quote   Android Studio Java中的两个变量求和