解析包含xml但抛出

2024-04-26 08:14:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试解析xml。需要标题,描述和出版日期。 我得到一个错误:

  for item in doc.findAll('rss/channel/item'):
AttributeError: 'str' object has no attribute 'findAll'

这是我的密码:

from bs4 import BeautifulSoup
import csv, sys
import urllib2
from xml.dom.minidom import parse, parseString

toursxml = 'http://www.tradingeconomics.com/rss/news.aspx'
toursurl= urllib2.urlopen(toursxml)
doc= toursurl.read()
#parseString( doc )
#print doc
data = []
cols = set()
for item in doc.findAll('rss/channel/item'):
    d = {}
    for sub in item:
        if hasattr(sub, 'name'):
            d[sub.name] = sub.text
    data.append(d)
    cols = cols.union(d.keys())

cw = csv.writer(sys.stdout)
cw.writerow(cols)
for row in data:
    cw.writerow([row.get(k, 'N/A') for k in cols])

Tags: csvinfromimportfordatadocsys
1条回答
网友
1楼 · 发布于 2024-04-26 08:14:14

您试图用错误的工具解析RSS提要。您的代码试图使用BeautifulSoup方法,但实际上没有创建BeautifulSoup对象,试图使用带有不支持XPath的API的XPath表达式,并试图使用用于HTML而不是XML的库。你知道吗

使用^{}来处理这样的提要:

import feedparser

feed = feedparser.parse('http://www.tradingeconomics.com/rss/news.aspx')

for item in feed.entries:
    print item.title, item.author

这将产生:

>>> import feedparser
>>> feed = feedparser.parse('http://www.tradingeconomics.com/rss/news.aspx')
>>> for item in feed.entries:
...     print item.title, item.author
... 
Latvia Retail Sales MoM Central Statistical Bureau of Latvia
China Foreign Exchange Reserves People's Bank of China
Latvia Retail Sales YoY Central Statistical Bureau of Latvia
Spain Business Confidence Ministry of Industry, Tourism and Trade, Spain
Italy Consumer Price Index (CPI) National Institute of Statistics (ISTAT)
Italy Inflation Rate National Institute of Statistics (ISTAT)
Cyprus Inflation Rate Statistical Service of the Republic of Cyprus
# .... and many more lines

相关问题 更多 >