Python - 从HTML文件中获取所有图片

Question

有人能帮我用Python解析一个HTML文件，提取里面所有图片的链接吗？

最好不要用第三方模块……

谢谢！

Answer 1

大家普遍认为，lxml的速度比Beautiful Soup快(参考)。你可以在这里找到它的教程：(链接)。另外，你也可以看看这个旧的StackOverflow帖子。

Answer 2

仅使用PSL

from html.parser import HTMLParser
class MyParse(HTMLParser):
    def handle_starttag(self, tag, attrs):
        if tag=="img":
            print(dict(attrs)["src"])

h=MyParse()
page=open("index.html").read()
h.feed(page)

Answer 3

你可以使用Beautiful Soup这个工具。我知道你说过不想用第三方模块。不过，这个工具非常适合用来解析HTML。

import urllib2
from BeautifulSoup import BeautifulSoup
page = BeautifulSoup(urllib2.urlopen("http://www.url.com"))
page.findAll('img')

Python - 从HTML文件中获取所有图片

3 个回答

撰写回答