将html字符串拆分为lis

2条回答

网友

1楼 · 编辑于 2024-04-19 21:11:52

是的，使用解析器。你知道吗

但是，这会做到：

def f(x):
    lst = []
    rec = ""
    for i in x:
        if i == "<":
            if rec != "":
                lst.append(rec)
            rec = ""
        rec += i
        if i == ">":
            lst.append(rec)
            rec = ""
    return lst

它基本上记录了“<；”和“>；”之间的所有内容，并将其添加到列表中。它还记录所有“>；”和“<；”之间的间隙，以便记录诸如“xxxx”之类的内容

网友

2楼 · 编辑于 2024-04-19 21:11:52

为此，请使用^{}

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result

In [1]: a='<.tag> xxxxx<./tag> <.tag>'

In [2]: import re
In [4]: re.findall(r'<[^>]+>|\w+',a)
Out[4]: ['<.tag>', 'xxxxx', '<./tag>', '<.tag>']

In [5]: re.findall(r'<[^>]+>|[^<]+',a)
Out[5]: ['<.tag>', ' xxxxx', '<./tag>', ' ', '<.tag>']

In [17]: [i.strip() for i in re.findall(r'<[^>]+>|[^<]+',a) if not i.isspace()]
Out[17]: ['<.tag>', 'xxxxx', '<./tag>', '<.tag>']

相关问题更多 >

编程相关推荐

热门问题

热门文章

将html字符串拆分为lis

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >