使用elemen读取xml文件

<root> <Group> <ChapterNo>1</ChapterNo> <ChapterName>A</ChapterName> <Line>1</Line> <Content>zfsdfsdf</Content> <Synonyms>fdgd</Synonyms> <Translation>assdfsdfsdf</Translation> </Group> <Group> <ChapterNo>1</ChapterNo> <ChapterName>A</ChapterName> <Line>2</Line> <Content>ertreter</Content> <Synonyms>retreter</Synonyms> <Translation>erterte</Translation> </Group> <Group> <ChapterNo>2</ChapterNo> <ChapterName>B</ChapterName> <Line>1</Line> <Content>sadsafs</Content> <Synonyms>sdfsdfsd</Synonyms> <Translation>sdfsdfsd</Translation> </Group> <Group> <ChapterNo>2</ChapterNo> <ChapterName>B</ChapterName> <Line>2</Line> <Content>retete</Content> <Synonyms>retertret</Synonyms> <Translation>retertert</Translation> </Group> </root>

root = ElementTree.parse('data.xml').getroot() ChapterNo = root.find('ChapterNo').text ChapterName = root.find('ChapterName').text GitaLine = root.find('Line').text Content = root.find('Content').text Synonyms = root.find('Synonyms').text Translation = root.find('Translation').text

3条回答

网友

1楼 · 编辑于 2024-05-19 00:03:42

要解析简单的两级数据结构并为每个组组合一个dict，只需执行以下操作：

>>> # what you did to get `root`
>>> from pprint import pprint as pp
>>> for group in root:
...     d = {}
...     for elem in group:
...         d[elem.tag] = elem.text
...     pp(d) # or whack it ito a database
...
{'ChapterName': 'A',
 'ChapterNo': '1',
 'Content': 'zfsdfsdf',
 'Line': '1',
 'Synonyms': 'fdgd',
 'Translation': 'assdfsdfsdf'}
{'ChapterName': 'A',
 'ChapterNo': '1',
 'Content': 'ertreter',
 'Line': '2',
 'Synonyms': 'retreter',
 'Translation': 'erterte'}
{'ChapterName': 'B',
 'ChapterNo': '2',
 'Content': 'sadsafs',
 'Line': '1',
 'Synonyms': 'sdfsdfsd',
 'Translation': 'sdfsdfsd'}
{'ChapterName': 'B',
 'ChapterNo': '2',
 'Content': 'retete',
 'Line': '2',
 'Synonyms': 'retertret',
 'Translation': 'retertert'}
>>>

听着，妈，没有xpath！

网友

2楼 · 编辑于 2024-05-19 00:03:42

ChapterNo不是root的直接子级，因此root.find('ChapterNo')不起作用。您需要使用xpath语法来查找数据。

此外，ChapterNo、ChapterName等也有多次出现，因此您应该使用findall，并遍历结果以获得每个结果的文本。

chapter_nos = [e.text for e in root.findall('.//ChapterNo')]

等等。

网友

3楼 · 编辑于 2024-05-19 00:03:42

下面是一个使用^{}定义对象的小示例，该对象将提取数据并将其存储在sqlite数据库中。

from sqlalchemy import create_engine, Unicode, Integer, Column, UnicodeText
from sqlalchemy.orm import create_session
from sqlalchemy.ext.declarative import declarative_base

engine = create_engine('sqlite:///chapters.sqlite', echo=True)
Base = declarative_base(bind=engine)

class ChapterLine(Base):
    __tablename__ = 'chapterlines'
    chapter_no = Column(Integer, primary_key=True)
    chapter_name = Column(Unicode(200))
    line = Column(Integer, primary_key=True)
    content = Column(UnicodeText)
    synonyms = Column(UnicodeText)
    translation = Column(UnicodeText)

    @classmethod
    def from_xmlgroup(cls, element):
        l = cls()
        l.chapter_no = int(element.find('ChapterNo').text)
        l.chapter_name = element.find('ChapterName').text
        l.line = int(element.find('Line').text)
        l.content = element.find('Content').text
        l.synonyms = element.find('Synonyms').text
        l.translation = element.find('Translation').text
        return l

Base.metadata.create_all() # creates the table

使用方法如下：

from xml.etree import ElementTree as etree

session = create_session(bind=engine, autocommit=False)
doc = etree.parse('myfile.xml').getroot()
for group in doc.findall('Group'):
    l = ChapterLine.from_xmlgroup(group)
    session.add(l)

session.commit()

我已经在xml数据中测试了这段代码，它运行良好，将所有内容都插入到数据库中。

相关问题更多 >

编程相关推荐

热门问题

热门文章