如何用Python查找XML标签之间的值？

3 投票

4 回答

3848 浏览

提问于 2025-04-16 00:06

我正在使用谷歌网站来获取天气信息，我想找到XML标签之间的值。下面的代码可以让我获取一个城市的天气状况，但我无法获取其他参数，比如温度。如果可以的话，请解释一下代码中使用的split函数是怎么工作的：

import urllib

def getWeather(city):

    #create google weather api url
    url = "http://www.google.com/ig/api?weather=" + urllib.quote(city)

    try:
        # open google weather api url
        f = urllib.urlopen(url)
    except:
        # if there was an error opening the url, return
        return "Error opening url"

    # read contents to a string
    s = f.read()

    # extract weather condition data from xml string
    weather = s.split("<current_conditions><condition data=\"")[-1].split("\"")[0]

    # if there was an error getting the condition, the city is invalid


    if weather == "<?xml version=":
        return "Invalid city"

    #return the weather condition
    return weather

def main():
    while True:
        city = raw_input("Give me a city: ")
        weather = getWeather(city)
        print(weather)

if __name__ == "__main__":
    main()

谢谢你

数据提取天气信息 xml解析 split函数

4 个回答

我建议使用XML解析器，就像Hank Gay提到的那样。我个人推荐lxml，因为我现在在一个项目中使用它，它扩展了标准库中已经存在的非常好用的ElementTree接口（xml.etree）。

lxml还增加了对xpath、xslt和其他一些在标准ElementTree模块中缺失的功能的支持。

无论你选择哪个，XML解析器都是最好的选择，因为你可以把XML文档当作Python对象来处理。这意味着你的代码可能会像这样：

# existing code up to...
s = f.read()
import lxml.etree as ET
tree = ET.parse(s)
current = tree.find("current_condition/condition")
condition_data = current.get("data")
weather = condition_data
return weather

回答于 2025-04-16 由 Python大师

分享举报

使用

一个

解析器

你不能用正则表达式来解析XML，所以别尝试了。这里有一个关于在Python中找到XML解析器的起步链接。还有一个很好的网站，教你如何在Python中解析XML。

更新：关于PyS60的新信息，这里是诺基亚网站上关于使用XML的文档。

更新2：@Nas Banov请求了示例代码，下面是代码：

import urllib

from xml.parsers import expat

def start_element_handler(name, attrs):
    """
    My handler for the event that fires when the parser sees an
    opening tag in the XML.
    """
    # If we care about more than just the temp data, we can extend this
    # logic with ``elif``. If the XML gets really hairy, we can create a
    # ``dict`` of handler functions and index it by tag name, e.g.,
    # { 'humidity': humidity_handler }
    if 'temp_c' == name:
        print "The current temperature is %(data)s degrees Celsius." % attrs

def process_weather_conditions():
    """
    Main logic of the POC; set up the parser and handle resource
    cleanup.
    """
    my_parser = expat.ParserCreate()
    my_parser.StartElementHandler = start_element_handler

    # I don't know if the S60 supports try/finally, but that's not
    # the point of the POC.
    try:
        f = urllib.urlopen("http://www.google.com/ig/api?weather=30096")
        my_parser.ParseFile(f)
    finally:
        f.close()

if __name__ == '__main__':
    process_weather_conditions()

回答于 2025-04-16 由 Python大师

分享举报

好吧，接下来是一个不需要完整解析的解决方案，适合你的特定情况：

import urllib

def getWeather(city):
    ''' given city name or postal code,
        return dictionary with current weather conditions
    '''
    url = 'http://www.google.com/ig/api?weather='
    try:
        f = urllib.urlopen(url + urllib.quote(city))
    except:
        return "Error opening url"
    s = f.read().replace('\r','').replace('\n','')
    if '<problem' in s:
        return "Problem retreaving weather (invalid city?)"

    weather = s.split('</current_conditions>')[0]  \
               .split('<current_conditions>')[-1]  \
               .strip('</>')                       
    wdict = dict(i.split(' data="') for i in weather.split('"/><'))
    return wdict

这是一个使用示例：

>>> weather = getWeather('94043')
>>> weather
{'temp_f': '67', 'temp_c': '19', 'humidity': 'Humidity: 61%', 'wind_condition': 'Wind: N at 21 mph', 'condition': 'Sunny', 'icon': '/ig/images/weather/sunny.gif'}
>>> weather['humidity']
'Humidity: 61%'
>>> print '%(condition)s\nTemperature %(temp_c)s C (%(temp_f)s F)\n%(humidity)s\n%(wind_condition)s' % weather
Sunny
Temperature 19 C (67 F)
Humidity: 61%
Wind: N at 21 mph

顺便说一下，谷歌输出格式的一个小变化可能会导致这个方法失效，比如如果他们在标签或属性之间添加了额外的空格或制表符。其实他们通常会避免这样做，以减少HTTP响应的大小。但如果他们真的这么做了，我们就得学习一下正则表达式和re.split()了。

另外，str.split(sep)是怎么工作的可以在文档中找到，这里有一段摘录：返回字符串中单词的列表，使用sep作为分隔符。... sep参数可以由多个字符组成（例如，'1<>2<>3'.split('<>')会返回['1', '2', '3']）。所以'text1<tag>text2</tag>text3'.split('</tag>')会得到['text1<tag>text2', 'text3']，然后[0]取出第一个元素'text1<tag>text2'，接着我们在这里进行分割，得到'text2'，这就是我们感兴趣的数据。其实这很简单。

回答于 2025-04-16 由 Python大师

分享举报

如何用Python查找XML标签之间的值？

4 个回答

撰写回答