RSS/Python - 解析单个图片URL

1 投票

4 回答

2410 浏览

提问于 2025-04-17 11:54

我正在学习如何正确解析XML和RSS源，但遇到了一点小问题。我在用Python的feedparser来解析RSS源中的特定条目，但不知道怎么从内容部分只提取一个图片的链接（img src）。

这是我目前的代码。

import dirFeedparser.feedparser as feedparser

feedurl = feedparser.parse('http://dustinheroin.chompblog.com/index.php?cat=22&feed=rss2')
statusupdate = feedurl.entries[0].content

print statusupdate

现在，当我打印内容时，得到的是这个：

[{'base': u'http://dustinheroin.chompblog.com/index.php?cat=22&feed=rss2', 'type': u'text/html', 'value': u'<p><a href="http://dustinheroin.chompblog.com/wp-content/uploads/2012/01/20120129-154945.jpg"><img alt="20120129-154945.jpg" class="alignnone size-full" src="http://dustinheroin.chompblog.com/wp-content/uploads/2012/01/20120129-154945.jpg" /></a></p>', 'language': None}]

有什么好的方法可以从中获取图片链接吗？任何帮助都非常感谢！

4 个回答

如果你想要一个好用的HTML解析工具，可以试试BeautifulSoup。

用它来解析非常简单：

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(statusupdate['value'])
url = soup.find('img').src

回答于 2025-04-17 由 Python大师

分享举报

接下来，你需要用一个单独的HTML解析器来解析HTML内容，并获取标签的属性。你可以考虑使用Beautiful Soup这个工具。

例如：

from BeautifulSoup import BeautifulSoup
import feedparser

feedurl = feedparser.parse('http://dustinheroin.chompblog.com/index.php?cat=22&feed=rss2')
statusupdate = feedurl.entries[0].content[0]

soup = BeautifulSoup(statusupdate["value"])
print(soup.find("img")["src"])

需要注意的是，这个方法只是找到第一个标签。如果你想要更精确地选择，可以看看findall这个方法。

回答于 2025-04-17 由 Python大师

分享举报

@Lattyware，你在设置soap的时候遇到了一些问题。

@user1130601，你可以看看下面的代码：

#!/usr/bin/python

from BeautifulSoup import BeautifulSoup
import feedparser

feedurl = feedparser.parse('http://dustinheroin.chompblog.com/index.php?cat=22&feed=rss2')
statusupdate = feedurl.entries[0].content


soup = BeautifulSoup(statusupdate[0]['value'])
print(soup.find("img")["src"])

输出结果：

http://dustinheroin.chompblog.com/wp-content/uploads/2012/01/20120129-171134.jpg

回答于 2025-04-17 由 Python大师

分享举报

RSS/Python - 解析单个图片URL

4 个回答

撰写回答