解析XML CDATA节并使用ElementTree python将其转换为CSV

def make_csv(folderpath, xmlfilename, csvwriter, csv_file): rows = [] #Parse XML file tree = ET.parse(os.path.join(folderpath, xmlfilename)) root = tree.getroot() for elem in root.findall("DOC") : rows = [] sentence = elem.find("TEXT") if sentence != None: sentence = re.sub('\n', '', sent.text) rows.append(sentence) csvwriter.writerow(rows) csv_file.close()

1条回答

网友

1楼 · 发布于 2024-04-25 20:31:28

My problem is that I don't know how to access CDATA content. Because TEXT in some DOCs has an IMAGE child

下面的方法似乎有效。代码处理文本下的图像和文本下没有图像的情况

import xml.etree.ElementTree as ET

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<root>
   <DOC>
      <TEXT>
         <IMAGE>/1379/791012/p18-1.jpg</IMAGE>
         <![CDATA[The section I want to access to]]>
      </TEXT>
      <TEXT>
         <![CDATA[more text]]>
      </TEXT>
   </DOC></root>'''

root = ET.fromstring(xml)
texts = root.findall('.//TEXT')
for idx, text in enumerate(texts, start=1):
    data = list(text)[0].tail.strip() if list(text) else text.text.strip()
    print(f'{idx}) {data}')

输出

1) The section I want to access to
2) more text

相关问题更多 >

编程相关推荐

热门问题

热门文章

解析XML CDATA节并使用ElementTree python将其转换为CSV

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >