匹配字符

2024-04-19 07:30:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我是python新手,需要帮助解决这个错误。 我有两本类似这样的字典:

其他条款

{'Protein1':'AGCGGGTTTTTACCCCCCGTTTTGGGACCCCCACTGCGTC', 
 'Protein2':'AGCGGGTTTTACCC---GGTTTTGGACCCCCACTGCGTC',
 'Protein3':'AGCGGGTTTTTACCCCCCGTGTTGGGACCCCCACTGCGTC'}

鼠标定位

{'Protein4':'AGCGGCTTTTTACCCCCCGTGTTGGGACCGCCACTGCGTC'}

我试着打印(I)蛋白质4值中的字符与蛋白质1、蛋白质2和蛋白质3值中的字符相匹配 蛋白质4与蛋白质1、蛋白质2和蛋白质3的错配特征以及这些错配特征在蛋白质4中的位置。你知道吗

我目前在我的第一个问题,并编辑了脚本,我发现网上,但我收到了错误,而运行它

错误如下所示

p = _cache.get(cachekey)

TypeError: unhashable type: 'list'

这是我的剧本:

otherseq=OtherSeqDict.values()
mouseseq=MouseSeqDict.values()

for match in re.finditer(mouseseq,otherseq):

        start=match.start()

        end=match.end()

        print 'Found "%s" at %d:%d' %(text[start:end],start,end)

有人能告诉我怎么做吗?你知道吗

谢谢!!你知道吗


Tags: 字典match错误蛋白质特征字符start条款
2条回答

这里有一个方法:

otherseq = {'Protein1':'AGCGGGTTTTTACCCCCCGTTTTGGGACCCCCACTGCGTC', 
 'Protein2':'AGCGGGTTTTACCC -GGTTTTGGACCCCCACTGCGTC',
 'Protein3':'AGCGGGTTTTTACCCCCCGTGTTGGGACCCCCACTGCGTC'}

mouseseq = {'Protein4':'AGCGGCTTTTTACCCCCCGTGTTGGGACCGCCACTGCGTC'}

def compareSeqs(seq1, seq2):
    matches = [k for k, v in enumerate(zip(seq1, seq2)) if v[0] == v[1]]
    mismatches = [k for k, v in enumerate(zip(seq1, seq2)) if v[0] != v[1]]
    return (matches, mismatches)

def compareGroups(group1, group2):
    for name1 in group1:
        for name2 in group2:
            seq1 = group1[name1]
            seq2 = group2[name2]
            matches, mismatches = compareSeqs(seq1, seq2)
            print "Comparing "+name1+" vs "+name2+":"
            print "\tMatches:    ", matches
            print "\tMismatches: ", mismatches

compareGroups(mouseseq, otherseq)

输出:

Comparing Protein4 vs Protein3:
    Matches:     [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
    Mismatches:  [5, 29]
Comparing Protein4 vs Protein2:
    Matches:     [0, 1, 2, 3, 4, 6, 7, 8, 9, 12, 13, 18, 19, 21, 22, 23, 24, 27, 28, 30]
    Mismatches:  [5, 10, 11, 14, 15, 16, 17, 20, 25, 26, 29, 31, 32, 33, 34, 35, 36, 37, 38]
Comparing Protein4 vs Protein1:
    Matches:     [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
    Mismatches:  [5, 20, 29]
OtherSeqDict={'Protein1':'AGCGGGTTTTTACCCCCCGTTTTGGGACCCCCACTGCGTC', 'Protein2':'AGCGGGTTTTACCC -GGTTTTGGACCCCCACTGCGTC', 'Protein3':'AGCGGGTTTTTACCCCCCGTGTTGGGACCCCCACTGCGTC'}

MouseSeqDict = {'Protein4':'AGCGGCTTTTTACCCCCCGTGTTGGGACCGCCACTGCGTC'}

ms_v = MouseSeqDict['Protein4']

# get common indexes
for k, v in OtherSeqDict.items():
    # zip  MouseSeqDict value string and current value string
    # use enumerate to get the index, adding it if we find common elements at the same index from each string
    print("matched: {} {}".format(k,[i for i, tup in enumerate(zip(ms_v,  v)) if tup[0] == tup[1]]))
    print("unmatched : {} {}".format(k,[i for i, tup in enumerate(zip(ms_v,  v)) if tup[0] != tup[1]]))

matched: Protein3 [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
unmatched : Protein3 [5, 29]
matched: Protein2 [0, 1, 2, 3, 4, 6, 7, 8, 9, 12, 13, 18, 19, 21, 22, 23, 24, 27, 28, 30]
unmatched : Protein2 [5, 10, 11, 14, 15, 16, 17, 20, 25, 26, 29, 31, 32, 33, 34, 35, 36, 37, 38]
matched: Protein1 [0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27, 28, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
unmatched : Protein1 [5, 20, 29]

要获取任何假定顺序无关紧要的公共元素,可以使用set.intersection

for k, v in OtherSeqDict.items():
    print(k, set(v).intersection(ms_v))

{'C', 'T', 'G', 'A'}
{'C', 'T', 'G', 'A'}
{'C', 'T', 'G', 'A'}

您还可以向循环中的每个元组添加索引和字母,这可能更有用:

for k, v in OtherSeqDict.items():
    print("matched: {} {}".format(k,[(i,tup[0]) for i, tup in enumerate(zip(ms_v,  v)) if tup[0] == tup[1]]))
    print("unmatched : {} {}".format(k,[(i,)+tup for i, tup in enumerate(zip(ms_v,  v)) if tup[0] != tup[1]])



matched: Protein2 [(0, 'A'), (1, 'G'), (2, 'C'), (3, 'G'), (4, 'G'), (6, 'T'), (7, 'T'), (8, 'T'), (9, 'T'), (12, 'C'), (13, 'C'), (18, 'G'), (19, 'T'), (21, 'T'), (22, 'T'), (23, 'G'), (24, 'G'), (27, 'C'), (28, 'C'), (30, 'C')]
unmatched : Protein2 [(5, 'C', 'G'), (10, 'T', 'A'), (11, 'A', 'C'), (14, 'C', '-'), (15, 'C', '-'), (16, 'C', '-'), (17, 'C', 'G'), (20, 'G', 'T'), (25, 'G', 'A'), (26, 'A', 'C'), (29, 'G', 'C'), (31, 'C', 'A'), (32, 'A', 'C'), (33, 'C', 'T'), (34, 'T', 'G'), (35, 'G', 'C'), (36, 'C', 'G'), (37, 'G', 'T'), (38, 'T', 'C')]
matched: Protein1 [(0, 'A'), (1, 'G'), (2, 'C'), (3, 'G'), (4, 'G'), (6, 'T'), (7, 'T'), (8, 'T'), (9, 'T'), (10, 'T'), (11, 'A'), (12, 'C'), (13, 'C'), (14, 'C'), (15, 'C'), (16, 'C'), (17, 'C'), (18, 'G'), (19, 'T'), (21, 'T'), (22, 'T'), (23, 'G'), (24, 'G'), (25, 'G'), (26, 'A'), (27, 'C'), (28, 'C'), (30, 'C'), (31, 'C'), (32, 'A'), (33, 'C'), (34, 'T'), (35, 'G'), (36, 'C'), (37, 'G'), (38, 'T'), (39, 'C')]
unmatched : Protein1 [(5, 'C', 'G'), (20, 'G', 'T'), (29, 'G', 'C')]
matched: Protein3 [(0, 'A'), (1, 'G'), (2, 'C'), (3, 'G'), (4, 'G'), (6, 'T'), (7, 'T'), (8, 'T'), (9, 'T'), (10, 'T'), (11, 'A'), (12, 'C'), (13, 'C'), (14, 'C'), (15, 'C'), (16, 'C'), (17, 'C'), (18, 'G'), (19, 'T'), (20, 'G'), (21, 'T'), (22, 'T'), (23, 'G'), (24, 'G'), (25, 'G'), (26, 'A'), (27, 'C'), (28, 'C'), (30, 'C'), (31, 'C'), (32, 'A'), (33, 'C'), (34, 'T'), (35, 'G'), (36, 'C'), (37, 'G'), (38, 'T'), (39, 'C')]
unmatched : Protein3 [(5, 'C', 'G'), (29, 'G', 'C')]

相关问题 更多 >