XML:回溯父元素

2 投票
2 回答
998 浏览
提问于 2025-04-17 03:03

我在找一个解决方案,关于在Python中处理XML的问题。虽然spectrum不是根元素,但为了这个例子,我们假设它是根元素。

<spectrum index="2" id="controller=0 scan=3" defaultArrayLength="485">
          <cvParam cvRef="MS" accession="MS:1000511" name="ms level" value="2"/>
          <cvParam cvRef="MS" accession="MS:1000580" name="MSn spectrum" value=""/>
          <cvParam cvRef="MS" accession="MS:1000127" name="centroid mass spectrum" value=""/>
          <precursorList count="1">
            <precursor spectrumRef="controller=0 scan=2">
              <isolationWindow>
                <cvParam cvRef="MS" accession="MS:1000040" name="m/z" value="810.78999999999996"/>
                <cvParam cvRef="MS" accession="MS:1000023" name="isolation width" value="2"/>
              </isolationWindow>
              <selectedIonList count="1">
                <selectedIon>
                  <cvParam cvRef="MS" accession="MS:1000040" name="m/z" value="810.78999999999996"/>
                </selectedIon>
              </selectedIonList>
              <activation>
                <cvParam cvRef="MS" accession="MS:1000133" name="collision-induced dissociation" value=""/>
                <cvParam cvRef="MS" accession="MS:1000045" name="collision energy" value="35"/>
              </activation>
            </precursor>
          </precursorList>
          <binaryDataArrayList count="2">
            <binaryDataArray encodedLength="5176">
              <cvParam cvRef="MS" accession="MS:1000523" name="64-bit float" value=""/>
              <cvParam cvRef="MS" accession="MS:1000576" name="no compression" value=""/>
              <cvParam cvRef="MS" accession="MS:1000514" name="m/z array" value="" unitCvRef="MS" unitAccession="MS:1000040" unitName="m/z"/>
              <binary>AAAAYHHsbEAAAADg3yptQAAAAECt7G1AAAAAAN8JbkAAAAAA.......hLJ==</binary>
            </binaryDataArray>
            <binaryDataArray encodedLength="2588">
              <cvParam cvRef="MS" accession="MS:1000521" name="32-bit float" value=""/>
              <cvParam cvRef="MS" accession="MS:1000576" name="no compression" value=""/>
              <cvParam cvRef="MS" accession="MS:1000515" name="intensity array" value=""/>
              <binary>ZFzUQWmVo0FH/o9BRfUyQg+xjUOzkZdC5k66QWk6HUSpqyZCsV1NQ......uH=</binary>
            </binaryDataArray>
          </binaryDataArrayList>
</spectrum>

我想要做的是在这个树形结构中找到所有的selectedIon元素,并追溯到它的父元素spectrum。如果找到了selectedIon元素,那么

SelectedIon的信息是:


质量:810.78999999999996

Spectra Info:
-------------
index=2
id=controller=0
scan=3
length=485

General Info
------------
ms level=2
Msn spectrum= -
centriod mass spectrum=-
.....................
And all the cvParam name and value as above. 

Binary
------
m/z array = AAAAYHHsbEAAAADg3yptQAAAAECt7G1AAAA.....== 

intensity array = ZFzUQWmVo0FH/o9BRfUyQg+xjUOzkZdC5k66Q....5C77=

我到目前为止尝试过的:

import xml.etree.ElementTree as ET
tree=ET.parse('file.mzml')
NS="{http://psi.hupo.org/ms/mzml}"
filesource=tree.findall('.//'+NS+'selectedIon') # Will get all selectedIon element from the tree

那么我该如何追溯到spectrum元素或其子元素,以提取上面提到的相关信息呢?

我该如何成功呢?

2 个回答

0

如果这个问题现在还存在,你可以试试 pymzML,这是一个用来处理mzML文件的Python工具。

要打印所有MS2光谱的信息,其实非常简单:

import pymzml
msrun = pymzml.run.Reader("your-file.mzML")
for spectrum in msrun:
    if spectrum['ms level'] == 2:
        # spectrum is a dict, so you can just print it        
        print(spectrum)

(声明:我也是这个工具的作者之一)

1

XPath可以让你访问一个祖先元素:使用“ancestor::spectrum”可以找到你所在的元素。如果你使用lxml这个库,你可以用完整的XPath语法来找到你想要的元素。

from lxml import etree
tree = etree.XML('file.mzml')
NS = "{http://psi.hupo.org/ms/mzml}"
filesource = tree.findall('.//'+NS+'selectedIon')
spectrum = filesource.xpath('ancestor::spectrum')[0]

(我觉得是这样,但没测试过...)

更新:实际上能工作的代码:

from lxml import etree

tree = etree.parse('foo.xml')
for el in tree.findall(".//selectedIon"):
    for top in el.xpath("ancestor::spectrum"):
        print top

撰写回答