表示Python中的种群结构

2024-06-07 00:21:59 发布

您现在位置:Python中文网/ 问答频道 /正文

在Python中,我使用EggLib。我试图计算Jost在VCF文件中找到的每个SNP的D值。你知道吗

数据

数据是VCF格式的here。数据集很小,有2个群体,每个群体100个个体和6个单核苷酸多态性(都在1号染色体上)。你知道吗

每个个体被命名为Pp.Ii,其中p是它所属的总体索引,i是个体索引。你知道吗

代码

我的困难在于人口结构的具体化。这是我的审判

### Read the vcf file ###
vcf = egglib.io.VcfParser("MyData.vcf") 

### Create the `Structure` object ###
# Dictionary for a given cluster. There is only one cluster.
dcluster = {}            
# Loop through each population 
for popIndex in [0,1]:  
    # dictionnary for a given population. There are two populations
    dpop = {}            
    # Loop through each individual
    for IndIndex in range(popIndex * 100,(popIndex + 1) * 100):     
            # A single list to define an individual
        dpop[IndIndex] = [IndIndex*2, IndIndex*2 + 1]
    dcluster[popIndex] = dpop

struct = {0: dcluster}

### Define the population structure ###
Structure = egglib.stats.make_structure(struct, None) 

### Configurate the 'ComputeStats' object ###
cs = egglib.stats.ComputeStats()
cs.configure(only_diallelic=False)
cs.add_stats('Dj') # Jost's D

### Isolate a SNP ###
vcf.next()
site = egglib.stats.site_from_vcf(vcf)

### Calculate Jost's D ###
cs.process_site(site, struct=Structure)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Python/2.7/site-packages/egglib/stats/_cstats.py", line 431, in process_site
    self._frq.process_site(site, struct=struct)
  File "/Library/Python/2.7/site-packages/egglib/stats/_freq.py", line 159, in process_site
    if sum(struct) != site._obj.get_ning(): raise ValueError, 'invalid structure (sample size is required to match)'
ValueError: invalid structure (sample size is required to match)

文档显示here

[The Structure object] is a tuple containing two items, each being a dict. The first one represents the ingroup and the second represents the outgroup.

The ingroup dictionary is itself a dictionary holding more dictionaries, one for each cluster of populations. Each cluster dictionary is a dictionary of populations, populations being themselves represented by a dictionary. A population dictionary is, again, a dictionary of individuals. Fortunately, individuals are represented by lists.

An individual list contains the index of all samples belonging to this individual. For haploid data, individuals will be one-item lists. In other cases, all individual lists are required to have the same number of items (consistent ploidy). Note that, if the ploidy is more than one, nothing enforces that samples of a given individual are grouped within the original data.

The keys of the ingroup dictionary are the labels identifying each cluster. Within a cluster dictionary, the keys are population labels. Finally, within a population dictionary, the keys are individual labels.

The second dictionary represents the outgroup. Its structure is simpler: it has individual labels as keys, and lists of corresponding sample indexes as values. The outgroup dictionary is similar to any ingroup population dictionary. The ploidy is required to match over all ingroup and outgroup individuals.

但我无法理解。提供的示例是针对fasta格式的,我不理解如何将逻辑扩展到VCF格式。你知道吗


Tags: ofthetodictionaryisstatssitestruct
1条回答
网友
1楼 · 发布于 2024-06-07 00:21:59

有两个错误

第一个错误

函数make_structure返回结构对象,但不将其保存在stats中。因此,您必须保存此输出并在函数process_site中使用它。你知道吗

Structure = egglib.stats.make_structure(struct, None) 

第二个错误

结构对象必须指定单倍体。因此,将字典创建为

dcluster = {}            
for popIndex in [0,1]:  
    dpop = {}            
    for IndIndex in range(popIndex * 100,(popIndex + 1) * 100):     
        dpop[IndIndex] = [IndIndex]
    dcluster[popIndex] = dpop

struct = {0: dcluster}

相关问题 更多 >