python报告XML nod的起始行/列

2条回答

网友

1楼 · 编辑于 2024-05-23 09:06:17

通过monkeypatch minidom内容处理程序，我可以记录每个节点的行和列号（作为“parse_position”属性）。有点脏，但我看不出有什么“官方认可”的方式来做：）这是我的测试脚本：

from xml.dom import minidom
import xml.sax

doc = """\
<File>
  <name>Name</name>
  <pos>./</pos>
</File>
"""


def set_content_handler(dom_handler):
    def startElementNS(name, tagName, attrs):
        orig_start_cb(name, tagName, attrs)
        cur_elem = dom_handler.elementStack[-1]
        cur_elem.parse_position = (
            parser._parser.CurrentLineNumber,
            parser._parser.CurrentColumnNumber
        )

    orig_start_cb = dom_handler.startElementNS
    dom_handler.startElementNS = startElementNS
    orig_set_content_handler(dom_handler)

parser = xml.sax.make_parser()
orig_set_content_handler = parser.setContentHandler
parser.setContentHandler = set_content_handler

dom = minidom.parseString(doc, parser)
pos = dom.firstChild.parse_position
print("Parent: '{0}' at {1}:{2}".format(
    dom.firstChild.localName, pos[0], pos[1]))
for child in dom.firstChild.childNodes:
    if child.localName is None:
        continue
    pos = child.parse_position
    print "Child: '{0}' at {1}:{2}".format(child.localName, pos[0], pos[1])

它输出以下内容：

^{pr2}$

网友

2楼 · 编辑于 2024-05-23 09:06:17

解决这个问题的另一种方法是在解析文档之前将行号信息修补到文档中。想法如下：

LINE_DUMMY_ATTR = '_DUMMY_LINE' # Make sure this string is unique!
def parseXml(filename):
  f = file.open(filename, 'r')
  l = 0
  content = list ()
  for line in f:
    l += 1
    content.append(re.sub(r'<(\w+)', r'<\1 ' + LINE_DUMMY_ATTR + '="' + str(l) + '"', line))
  f.close ()

  return minidom.parseString ("".join(content))

然后可以使用

^{pr2}$

很明显，这种方法有自己的缺点，如果您真的也需要列号，那么在中进行修补会更加复杂。另外，如果要提取文本节点或注释或使用Node.toXml()，则必须确保从任何意外匹配中去掉LINE_DUMMY_ATTR。在

与aknuds1的答案相比，这个解决方案的一个优点是它不需要弄乱minidom内部结构。在

相关问题更多 >

编程相关推荐

热门问题

热门文章

python报告XML nod的起始行/列

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >