Python从xmld获取值

2024-04-27 23:27:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试使用Python从以下XML中获取一个值,但我不确定如何做到这一点:

<feed xmlns="http://www.w3.org/2005/Atom" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:apple-wallpapers="http://www.apple.com/ilife/wallpapers" xmlns:g-custom="http://base.google.com/cns/1.0" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:cc="http://web.resource.org/cc/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:g-core="http://base.google.com/ns/1.0">
  <title>Feed from some link here</title>
  <link rel="self" href="https://somelinkhere/folder/?parameter=abc" />
  <link rel="first" href="https://somelinkhere/folder/?parameter=abc" />
  <id>https://somelinkhere/folder/?parameter=abc</id>
  <updated>2018-03-06T17:48:09Z</updated>
  <dc:creator>company.com</dc:creator>
  <dc:date>2018-03-06T17:48:09Z</dc:date>
  <opensearch:totalResults>4</opensearch:totalResults>
  <opensearch:startIndex>1</opensearch:startIndex>

我在找最后一行。我该如何获得out<opensearch:totalResults>4</opensearch:totalResults>的号码?你知道吗

感谢您的帮助。你知道吗

编辑:

这是我目前正在尝试的代码:

r= requests.get("https://somelinkhere/folder/?parameter=abc", auth=HTTPBasicAuth('username', 'password'))
print r.text
print r.status_code

root = lxml.etree.fromstring(r)
textelem = root.find("opensearch:totalResults")
print textelem.text

然后得到以下错误:

Traceback (most recent call last):
  File "tickets.py", line 14, in <module>
    root = lxml.etree.fromstring(r)
  File "src/lxml/lxml.etree.pyx", line 3213, in lxml.etree.fromstring (src/lxml/lxml.etree.c:82934)
  File "src/lxml/parser.pxi", line 1818, in lxml.etree._parseMemoryDocument (src/lxml/lxml.etree.c:124513)
ValueError: can only parse strings

Tags: httpsorgcomhttpparameterwwwfolderopensearch
2条回答
    from bs4 import BeautifulSoup
    import re

    xml_data = u"""<feed xmlns="http://www.w3.org/2005/Atom" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:apple-wallpapers="http://www.apple.com/ilife/wallpapers" xmlns:g-custom="http://base.google.com/cns/1.0" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" xmlns:opensearch="http://a9.com/-/spec/opensearch/1.1/" xmlns:media="http://search.yahoo.com/mrss/" xmlns:sy="http://purl.org/rss/1.0/modules/syndication/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:georss="http://www.georss.org/georss/" xmlns:creativeCommons="http://backend.userland.com/creativeCommonsRssModule" xmlns:cc="http://web.resource.org/cc/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:g-core="http://base.google.com/ns/1.0">
      <title>Feed from some link here</title>
      <link rel="self" href="https://somelinkhere/folder/?parameter=abc" />
      <link rel="first" href="https://somelinkhere/folder/?parameter=abc" />
      <id>https://somelinkhere/folder/?parameter=abc</id>
      <updated>2018-03-06T17:48:09Z</updated>
      <dc:creator>company.com</dc:creator>
      <dc:date>2018-03-06T17:48:09Z</dc:date>
      <opensearch:totalResults>4</opensearch:totalResults>
      <opensearch:startIndex>1</opensearch:startIndex>"""

    soup = BeautifulSoup(xml_data, 'lxml')
    results = soup.find_all(re.compile("opensearch:totalresults")) # This will be a list
    values = [int(s.string) for s in results ]
    print (values)

必须在代码中添加r.text。你知道吗

相关问题 更多 >