Python 解析RSS的Feedparser

Question

你想知道怎么用Python的feedparser来解析下面的XML数据。

<Book_API>
<Contributor_List>
<Display_Name>Jason</Display_Name>
</Contributor_List>
<Contributor_List>
<Display_Name>John Smith</Display_Name>
</Contributor_List>
</Book_API>

Answer 1

这看起来不像什么RSS或ATOM的内容。我根本不会用feedparser来处理这个，而是会用lxml。实际上，feedparser根本无法理解这个内容，还把你例子中的“Jason”贡献者给丢掉了。

from lxml import etree

data = <fetch the data somehow>
root = etree.parse(data)

现在你有了一棵XML对象的树。具体怎么用lxml来处理，实际上要等你提供有效的XML数据后才能说清楚。;)

Answer 2

正如Lennart Regebro提到的，这似乎不是一个RSS或Atom的订阅源，而只是一个XML文档。在Python的标准库中，有几种解析XML的工具（包括SAX和DOM）。我推荐你使用ElementTree。另外，第三方库中，lxml是最好的选择（它可以直接替代ElementTree）。

try:
    from lxml import etree
except ImportError:
    try:
        from xml.etree.cElementTree as etree
    except ImportError:
        from xml.etree.ElementTree as etree

doc = """<Book_API>
<Contributor_List>
<Display_Name>Jason</Display_Name>
</Contributor_List>
<Contributor_List>
<Display_Name>John Smith</Display_Name>
</Contributor_List>
</Book_API>"""
xml_doc = etree.fromstring(doc)

Python 解析RSS的Feedparser

2 个回答

撰写回答