BeautifulSoup 打印多个标签/属性

1 投票

2 回答

667 浏览

提问于 2025-04-16 17:44

首先，这是我第一次尝试使用Python，到目前为止感觉还挺简单的，不过我还是遇到了一些问题。

我想把一个XML文件转换成RSS格式的XML。原始的XML文件看起来是这样的：

<news title="Random Title" date="Date and Time" subtitle="The article txt"></news>

最终应该变成这样：

<item>
<pubDate>Date and Time</pubDate>
<title>Random Title</title>
<content:encoded>The article txt</content:encoded>
</item>

我正在尝试使用Python和BeautifulSoup来实现这个转换，使用的脚本如下：

from BeautifulSoup import BeautifulSoup
import re

doc = [
'<news post_title="Random Title" post_date="Date and Time" post_content="The article txt">''</news></p>'
    ]
soup = BeautifulSoup(''.join(doc))

print soup.prettify()

posttitle = soup.news['post_title']
postdate = soup.news['post_date']
postcontent = soup.news['post_content']

print "<item>"
print "<pubDate>"
print postdate
print "</pubDate>"
print "<title>"
print posttitle
print "</title>"
print "<content:encoded>"
print postcontent
print "</content:encoded>"
print "</item>"

现在的问题是，它只获取了最上面的一个字符串XML，而没有获取其他的。有没有人能给我一些建议，帮我解决这个问题？

谢谢大家 :)

XML 数据解析编程问题 beautifulsoup 数据转换 rss 标签处理

2 个回答

你的示例文档变量只包含一个 <news> 元素。

但通常情况下，你需要遍历所有的新闻元素。

可以这样做：

for news in soup.findAll('news'):
    posttitle = news['post_title']
    postdate = news['post_date']
    postcontent = news['post_content']
    print "<item>"
    print "<pubDate>"
    print postdate
    print "</pubDate>"
    print "<title>"
    print posttitle
    print "</title>"
    print "<content:encoded>"
    print postcontent
    print "</content:encoded>"
    print "</item>"

回答于 2025-04-16 由 Python大师

分享举报

偷取代码并进行修正：

for news in soup.findAll('news'):
    posttitle = news['post_title']
    postdate = news['post_date']
    postcontent = news['post_content']
    print "<item>"
    print "<pubDate>"
    print postdate
    print "</pubDate>"
    print "<title>"
    print posttitle
    print "</title>"
    print "<content:encoded>"
    print postcontent
    print "</content:encoded>"
    print "</item>"

回答于 2025-04-16 由 Python大师

分享举报

BeautifulSoup 打印多个标签/属性

2 个回答

撰写回答