PythonHTMLPars

网友

1楼 · 编辑于 2024-05-13 20:50:30

我从docs扩展了示例：

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):

    def handle_starttag(self, tag, attrs):
        print "Encountered the beginning of a %s tag" % tag

    def handle_endtag(self, tag):
        print "Encountered the end of a %s tag" % tag

    def handle_data(self, data):
        print "Encountered data %s" % data

p = MyHTMLParser()
p.feed('<p>test</p>')

-

Encountered the beginning of a p tag
Encountered data test
Encountered the end of a p tag

网友

2楼 · 编辑于 2024-05-13 20:50:30

根据@tauran发布的内容，你可能想做这样的事情：

from HTMLParser import HTMLParser

class MyHTMLParser(HTMLParser):
    def print_p_contents(self, html):
        self.tag_stack = []
        self.feed(html)

    def handle_starttag(self, tag, attrs):
        self.tag_stack.append(tag.lower())

    def handle_endtag(self, tag):
        self.tag_stack.pop()

    def handle_data(self, data):
        if self.tag_stack[-1] == 'p':
            print data

p = MyHTMLParser()
p.print_p_contents('<p>test</p>')

现在，您可能希望将所有<p>内容推送到一个列表中，并作为结果或类似的其他内容返回该列表。

TIL：在使用这样的库时，您需要考虑使用堆栈！

网友

3楼 · 编辑于 2024-05-13 20:50:30

它似乎对我的代码不起作用，所以我像一种全局变量一样在外部定义了tag_stack = []。

from html.parser import HTMLParser
    tag_stack = []
    class MONanalyseur(HTMLParser):

    def handle_starttag(self, tag, attrs):
        tag_stack.append(tag.lower())
    def handle_endtag(self, tag):
        tag_stack.pop()
    def handle_data(self, data):
        if tag_stack[-1] == 'head':
            print(data)

parser=MONanalyseur()
parser.feed()

相关问题更多 >

编程相关推荐

热门问题

热门文章

PythonHTMLPars

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >