我正在尝试使用lxml验证以下XML
<?xml version='1.0' encoding='UTF-8'?>
<mets:mets xmlns:mets="http://www.loc.gov/METS/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xlink="http://www.w3.org/1999/xlink"
xmlns:csip="https://DILCIS.eu/XML/METS/CSIPExtensionMETS"
xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd https://DILCIS.eu/XML/METS/CSIPExtensionMETS https://earkcsip.dilcis.eu/schema/DILCISExtensionMETS.xsd">
<mets:metsHdr>
<mets:agent ROLE="ARCHIVIST" TYPE="ORGANIZATION">
<mets:name>foo</mets:name>
<mets:note csip:NOTETYPE="this is incorrect">bar</mets:note>
</mets:agent>
</mets:metsHdr>
<mets:structMap>
<mets:div/>
</mets:structMap>
</mets:mets>
我采用了here中的脚本(并添加了一些小的CLI改进和python3修复):
import sys
from lxml import etree
XSI = "http://www.w3.org/2001/XMLSchema-instance"
XS = '{http://www.w3.org/2001/XMLSchema}'
SCHEMA_TEMPLATE = b"""<?xml version = "1.0" encoding = "UTF-8"?>
<xs:schema xmlns="http://dummy.libxml2.validator"
targetNamespace="http://dummy.libxml2.validator"
xmlns:xs="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
version="1.0"
elementFormDefault="qualified"
attributeFormDefault="unqualified">
</xs:schema>"""
def validate_XML(xml):
"""Validate an XML file represented as string. Follow all schemaLocations.
:param xml: path to xml.
:type xml: str
"""
tree = etree.parse(xml)
schema_tree = etree.XML(SCHEMA_TEMPLATE)
# Find all unique instances of 'xsi:schemaLocation="<namespace> <path-to-schema.xsd> ..."'
schema_locations = set(tree.xpath("//*/@xsi:schemaLocation", namespaces={'xsi': XSI}))
for schema_location in schema_locations:
# Split namespaces and schema locations ; use strip to remove leading
# and trailing whitespace.
namespaces_locations = schema_location.strip().split()
# Import all found namspace/schema location pairs
for namespace, location in zip(*[iter(namespaces_locations)] * 2):
xs_import = etree.Element(XS + "import")
xs_import.attrib['namespace'] = namespace
xs_import.attrib['schemaLocation'] = location
schema_tree.append(xs_import)
# Contstruct the schema
schema = etree.XMLSchema(schema_tree)
# Validate!
schema.assertValid(tree)
print('Success!')
if __name__ == '__main__':
validate_XML(sys.argv[1])
现在我希望验证不会说NOTETYPE
包含无效值(只有值SOFTWARE VERSION
有效),但是验证完成时没有任何错误。你知道吗
在诸如XML编辑器之类的工具中使用相同的文件会产生预期的错误:
Value 'this is incorrect' is not facet-valid with respect to enumeration '[SOFTWARE VERSION]'. It must be a value from the enumeration.
生成的架构:
<xs:schema xmlns="http://dummy.libxml2.validator" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" targetNamespace="http://dummy.libxml2.validator" version="1.0" elementFormDefault="qualified" attributeFormDefault="unqualified">
<xs:import namespace="http://www.loc.gov/METS/" schemaLocation="http://www.loc.gov/standards/mets/mets.xsd"/>
<xs:import namespace="https://DILCIS.eu/XML/METS/CSIPExtensionMETS" schemaLocation="https://earkcsip.dilcis.eu/schema/DILCISExtensionMETS.xsd"/>
</xs:schema>
目前没有回答
相关问题 更多 >
编程相关推荐