如何在Python中从XML标记中获取值？

<?xml version="1.0" encoding="UTF-8"?><searching> <query>query01</query> <document id="0"> <title>lord of the rings.</title> <snippet> this is a snippet of a document. </snippet> <url>http://www.google.com/</url> </document> <document id="1"> <title>harry potter.</title> <snippet> this is a snippet of a document. </snippet> <url>http://www.google.com/</url> </document> ........ #and other documents ..... <group id="0" size="298" score="145"> <title> <phrase>GROUP A</phrase> </title> <document refid="0"/> <document refid="1"/> <document refid="84"/> </group> <group id="0" size="298" score="55"> <title> <phrase>GROUP B</phrase> </title> <document refid="2"/> <document refid="13"/> <document refid="3"/> </group> </<searching>>

import codecs documentID = {} group = {} myfile = codecs.open("file.xml", mode = 'r', encoding = "utf8") for line in myfile: line = line.strip() #get id from tags #get title from tag #store in documentID #get group name and document reference

def outputCluster(rFile): documentInReadFile = {} #dictionary to store all document in readFile myfile = codecs.open(rFile, mode='r', encoding="utf8") soup = BeautifulSoup(myfile) # print all text in readFile: # print soup.prettify() # print soup.find+_all('title') outputCluster("file.xml")

3条回答

网友

1楼 · 编辑于 2024-05-15 11:53:33

Elementree在查找XML方面非常出色。如果您进入文档，它将向您展示如何以多种方式操作XML，包括如何获取标记的内容。文档中的一个示例是：
XML格式：

<?xml version="1.0"?>
<data>
    <country name="Liechtenstein">
        <rank>1</rank>
        <year>2008</year>
        <gdppc>141100</gdppc>
        <neighbor name="Austria" direction="E"/>
        <neighbor name="Switzerland" direction="W"/>
    </country>
    <country name="Singapore">
        <rank>4</rank>
        <year>2011</year>
        <gdppc>59900</gdppc>
        <neighbor name="Malaysia" direction="N"/>
    </country>
    <country name="Panama">
        <rank>68</rank>
        <year>2011</year>
        <gdppc>13600</gdppc>
        <neighbor name="Costa Rica" direction="W"/>
        <neighbor name="Colombia" direction="E"/>
    </country>
</data>

代码：

>>> for country in root.findall('country'):
...   rank = country.find('rank').text
...   name = country.get('name')
...   print name, rank
...
Liechtenstein 1
Singapore 4
Panama 68

你可以很容易地操纵它做你想做的事。

网友

2楼 · 编辑于 2024-05-15 11:53:33

以前的海报是有权利的。etree文档可以在以下位置找到：

https://docs.python.org/2/library/xml.etree.elementtree.html#module-xml.etree.ElementTree

可以帮助你。下面是一个可能会成功的代码示例（部分来自上面的链接）：

import xml.etree.ElementTree as ET
tree = ET.parse('your_file.xml')
root = tree.getroot()

for group in root.findall('group'):
  title = group.find('title')
  titlephrase = title.find('phrase').text
  for doc in group.findall('document'):
    refid = doc.get('refid')

或者，如果您希望ID存储在group标记中，您可以使用id = group.get('id')，而不是搜索所有的refid

网友

3楼 · 编辑于 2024-05-15 11:53:33

你看了Python's XML ^{}解析器了吗？网上有很多例子。

相关问题更多 >

编程相关推荐

热门问题

热门文章