gis元数据标准的解析器,包括fgdc和iso-19115
gis-metadata-parser的Python项目详细描述
地理信息系统元数据分析器
用于地理信息系统元数据的XML解析器,用于读取、验证、更新和输出一组核心属性,这些属性已在当前最常用的标准之间映射:
- FGDC
- ISO-19139(和ISO-19115)
- arcgis(用arcgis格式1.0测试)。
这个库与Python版本2.7和3.4到3.6兼容
安装
用pip install gis-metadata-parser
安装
用法
解析器可以从文件、xml字符串或url实例化。它们也可以从一个标准转换为另一个标准。
fromgis_metadata.arcgis_metadata_parserimportArcGISParserfromgis_metadata.fgdc_metadata_parserimportFgdcParserfromgis_metadata.iso_metadata_parserimportIsoParserfromgis_metadata.metadata_parserimportget_metadata_parser# From file objectswithopen(r'/path/to/metadata.xml')asmetadata:fgdc_from_file=FgdcParser(metadata)withopen(r'/path/to/metadata.xml')asmetadata:iso_from_file=IsoParser(metadata)# Detect standard based on root element, metadatafgdc_from_string=get_metadata_parser(""" <?xml version='1.0' encoding='UTF-8'?> <metadata> <idinfo> </idinfo> </metadata> """)# Detect ArcGIS standard based on root element and its nodesiso_from_string=get_metadata_parser(""" <?xml version='1.0' encoding='UTF-8'?> <metadata> <dataIdInfo/></dataIdInfo> <distInfo/></distInfo> <dqInfo/></dqInfo> </metadata> """)# Detect ISO standard based on root element, MD_Metadata or MI_Metadataiso_from_string=get_metadata_parser(""" <?xml version='1.0' encoding='UTF-8'?> <MD_Metadata> <identificationInfo> </identificationInfo> </MD_Metadata> """)# Convert from one standard to anotherfgdc_converted=iso_from_file.convert_to(FgdcParser)iso_converted=fgdc_from_file.convert_to(IsoParser)arcgis_converted=iso_converted.convert_to(ArcGISParser)
最后,可以更新、验证、应用和输出解析器的属性:
withopen(r'/path/to/metadata.xml')asmetadata:fgdc_from_file=FgdcParser(metadata)# Example simple propertiesfgdc_from_file.titlefgdc_from_file.abstractfgdc_from_file.place_keywordsfgdc_from_file.thematic_keywords# :see: gis_metadata.utils.get_supported_props for list of all supported properties# Complex propertiesfgdc_from_file.attributesfgdc_from_file.bounding_boxfgdc_from_file.contactsfgdc_from_file.datesfgdc_from_file.digital_formsfgdc_from_file.larger_worksfgdc_from_file.process_stepsfgdc_from_file.raster_info# :see: gis_metadata.utils.get_complex_definitions for structure of all complex properties# Update propertiesfgdc_from_file.title='New Title'fgdc_from_file.dates={'type':'single''values':'1/1/2016'}# Apply updatesfgdc_from_file.validate()# Ensure updated properties are validfgdc_from_file.serialize()# Output updated XML as a stringfgdc_from_file.write()# Output updated XML to existing filefgdc_from_file.write(out_file_or_path='/path/to/updated.xml')# Output updated XML to new file
扩展和自定义
提示
关于元数据解析器的连接方式,有一些不成文的(到目前为止)规则:
- 属性通常由xpath在每个
parser._data_map
中定义
- 简单解析器属性只接受
string
和list
的值 - 数据映射中配置的xpath支持对元素属性的引用:
'path/to/element/@attr'
- 复杂的解析器属性由自定义解析器/更新器函数定义,而不是由xpath定义
- 复杂解析器属性接受类型为
dict
的值,该类型包含简单属性,或者是上述dict
的列表 - 带有前导下划线的属性将被解析,但不会被验证或写出
- “阴影”其他属性但带有前导下划线的属性用作备份值
- 对于备份属性,附加下划线表示进一步的备份选项,即
title
、_title
、__title
现有备份属性的一些示例如下:
# In the ArcGIS parser for distribution contact phone:_agis_tag_formats={...'dist_phone':'distInfo/distributor/distorCont/rpCntInfo/cntPhone/voiceNum','_dist_phone':'distInfo/distributor/distorCont/rpCntInfo/voiceNum',# If not in cntPhone...}# In the FGDC parser for sub-properties in the contacts definition:_fgdc_definitions=get_complex_definitions()_fgdc_definitions[CONTACTS].update({'_name':'{_name}','_organization':'{_organization}'})...classFgdcParser(MetadataParser):...def_init_data_map(self):...ct_format=_fgdc_tag_formats[CONTACTS]fgdc_data_structures[CONTACTS]=format_xpaths(...name=ct_format.format(ct_path='cntperp/cntper'),_name=ct_format.format(ct_path='cntorgp/cntper'),# If not in cntperporganization=ct_format.format(ct_path='cntperp/cntorg'),_organization=ct_format.format(ct_path='cntorgp/cntorg'),# If not in cntperp)# Also see the ISO parser for backup sub-properties in the attributes definition:_iso_definitions=get_complex_definitions()_iso_definitions[ATTRIBUTES].update({'_definition_source':'{_definition_src}','__definition_source':'{__definition_src}','___definition_source':'{___definition_src}'})
示例
任何支持的解析器都可以被扩展以包含标准支持的更多数据在本例中,我们将向IsoParser
添加两个新属性:
metadata_language
:一个简单的字符串字段,描述元数据文件本身的语言(而不是数据集)- {CD14> }:一个具有联系信息的复杂结构,利用和增强现有的接触结构
此示例将涵盖:
- 添加新的简单属性
- 为属性配置备份位置
- 在xpath中引用元素属性
- 添加新的复杂属性
- 自定义复杂属性以包含新的子属性
此外,单元测试还专门涵盖了这个示例
fromgis_metadata.iso_metadata_parserimportIsoParserfromgis_metadata.utilsimportCONTACTS,format_xpaths,get_complex_definitions,ParserPropertyclassCustomIsoParser(IsoParser):def_init_data_map(self):super(CustomIsoParser,self)._init_data_map()# Basic property: text or list (with backup location referencing codeListValue attribute)lang_prop='metadata_language'self._data_map[lang_prop]='language/CharacterString'# Parse from here if presentself._data_map['_'+lang_prop]='language/LanguageCode/@codeListValue'# Otherwise, try from here# Complex structure (reuse of contacts structure plus phone)# Define some basic variablesct_prop='metadata_contacts'ct_xpath='contact/CI_ResponsibleParty/{ct_path}'ct_defintion=get_complex_definitions()[CONTACTS]ct_defintion['phone']='{phone}'# Reuse CONTACT structure to specify locations per prop (adapted only slightly from parent)self._data_structures[ct_prop]=format_xpaths(ct_defintion,name=ct_xpath.format(ct_path='individualName/CharacterString'),organization=ct_xpath.format(ct_path='organisationName/CharacterString'),position=ct_xpath.format(ct_path='positionName/CharacterString'),phone=ct_xpath.format(ct_path='contactInfo/CI_Contact/phone/CI_Telephone/voice/CharacterString'),email=ct_xpath.format(ct_path='contactInfo/CI_Contact/address/CI_Address/electronicMailAddress/CharacterString'))# Set the contact root to insert new elements at "contact" level given the defined path:# 'contact/CI_ResponsibleParty/...'# By default we would get multiple "CI_ResponsibleParty" elements under a single "contact"# This way we get multiple "contact" elements, each with its own single "CI_ResponsibleParty"self._data_map['_{prop}_root'.format(prop=ct_prop)]='contact'# Use the built-in support for parsing complex properties (or write your own a parser/updater)self._data_map[ct_prop]=ParserProperty(self._parse_complex_list,self._update_complex_list)# And finally, let the parent validation logic know about the two new custom propertiesself._metadata_props.add(lang_prop)self._metadata_props.add(ct_prop)withopen(r'/path/to/metadata.xml')asmetadata:iso_from_file=CustomIsoParser(metadata)iso_from_file.metadata_languageiso_from_file.metadata_contacts