正则表达式匹配无法使用Pyteomics解析器处理简单字符串

import re import pyteomics from pyteomics import fasta, parser def ButcherShop(df, target, rule,min_length=7,exception=None,max_legnth=100, pH=2.0): > raw = df[target] > unique_peptides = set() > for peptide in raw: > new_peptides = parser.cleave(peptide, rule=rule,min_length=min_length,exception=exception) > unique_peptides.update(new_peptides) > print(f'Done,{len(unique_peptides)} sequences of >= 7 amino acids!') > pep_dic = [{'sequence': i} for i in unique_peptides] > for peptides in pep_dic: > pep_dic['parsed_sequence'] = parser.parse(peptides,show_unmodified_termini=False) > pep_dic['xlength'] = len(peptides) > pep_dic['charge'] = int(round(electrochem.charge(peptides, pH=pH))) > pep_dic['mass']=int(round(Peptide_mass(peptides))) > pep_dic = [peptide for peptide in pep_dic if peptide['length'] <= int(max_length)] > pep_df = pd.DataFrame.from_dict(pep_dic) > return unique_peptides,pep_dic,pep_df

3条回答

网友

1楼 · 编辑于 2024-05-16 22:05:34

这里是Pyteomics维护人员

错误消息实际上告诉您问题的根源：PyteomicsError: Pyteomics error, message: "Not a valid modX sequence: {'sequence': 'AKDEVQKN'}"

这意味着传递的不是字符串'AKDEVQKN'，而是字典{'sequence': 'AKDEVQKN'}。这实际上发生在这里：

pep_dic = [{'sequence': i} for i in unique_peptides]
for peptides in pep_dic:
    pep_dic['parsed_sequence'] = parser.parse(peptides,show_unmodified_termini=False)
    ...

您应该将序列本身传递给parse，而不是dict：

pep_dic['parsed_sequence'] = parser.parse(peptides['sequence'], show_unmodified_termini=False)

网友

2楼 · 编辑于 2024-05-16 22:05:34

不是一个解决方案，而是一些分析

在下面的简单示例代码中，“AKDEVQKN”使用post中的正则表达式进行匹配

import re

line = 'AKDEVQKN'

pat = re.compile(r'^([^-]+-)?((?:[^A-Z-]*[A-Z])+)(-[^-]+)?$')

x = re.match(pat, line)

if x:
    print(x)
    print(x.group())
    print(x.groups())

产出：

<re.Match object; span=(0, 8), match='AKDEVQKN'>
AKDEVQKN
(None, 'AKDEVQKN', None)

这表明问题在代码的其他地方

“AKDEVQKN”是完整的系列还是更多
使用序列“AKDEVQKN”调用re.match时，_modX_序列是否可能已更改？要检查，请临时更改
~\Anaconda\envs\SciFly\lib\site-packages\pyteomics\parser.py
在第312行，从：

try:
  n, body, c = re.match(_modX_sequence, sequence).groups()
except AttributeError:

到

try:
  if sequence == 'AKDEVQKN':
    print("DEBUG: ", sequence, _modX_sequence)
    # or drop into a debugger, pdb or iPython's 
    # import pdb; pdb.set_trace()
    # dir() 
  n, body, c = re.match(_modX_sequence, sequence).groups()
except AttributeError:

网友

3楼 · 编辑于 2024-05-16 22:05:34

在我运行解析器之前，尝试使用它们的有效函数测试所有肽。我在字符串中找不到任何false。我现在正在研究它们的功能或我自己的功能

> for peptide in menu["Peptide"]:
>     x=parser.valid(peptide)
>     if x == False:
>         print(peptide)
>         break
>     else:
>         print(x)

相关问题更多 >

编程相关推荐

热门问题

热门文章