如何在Python中从XML标记中获取值？问题的回答

如何在Python中从XML标记中获取值？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我有如下XML文件。 <pre><code><?xml version="1.0" encoding="UTF-8"?><searching> <query>query01</query> <document id="0"> <title>lord of the rings.</title> <snippet> this is a snippet of a document. </snippet> <url>http://www.google.com/</url> </document> <document id="1"> <title>harry potter.</title> <snippet> this is a snippet of a document. </snippet> <url>http://www.google.com/</url> </document> ........ #and other documents ..... <group id="0" size="298" score="145"> <title> <phrase>GROUP A</phrase> </title> <document refid="0"/> <document refid="1"/> <document refid="84"/> </group> <group id="0" size="298" score="55"> <title> <phrase>GROUP B</phrase> </title> <document refid="2"/> <document refid="13"/> <document refid="3"/> </group> </<searching>> </code></pre> 我想得到上面的组名以及每个组中的文档id（及其标题）是什么。我的想法是将文档id和文档标题存储到字典中，如下所示： <pre><code>import codecs documentID = {} group = {} myfile = codecs.open("file.xml", mode = 'r', encoding = "utf8") for line in myfile: line = line.strip() #get id from tags #get title from tag #store in documentID #get group name and document reference </code></pre> 而且，我也试过美容组，但对它很陌生。我不知道该怎么办。这是我正在做的代码。 <pre><code>def outputCluster(rFile): documentInReadFile = {} #dictionary to store all document in readFile myfile = codecs.open(rFile, mode='r', encoding="utf8") soup = BeautifulSoup(myfile) # print all text in readFile: # print soup.prettify() # print soup.find+_all('title') outputCluster("file.xml") </code></pre> 请给我一些建议。谢谢您。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

如何在Python中从XML标记中获取值？

1 个回答

相关Python问题