将html字符串拆分为lis

2024-04-19 21:11:52 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我的字符串:

'<.tag> xxxxx<./tag> <.tag>'

我想把它附加到一个列表中:

x=['<.tag>','xxxx','<./tag>','<.tag>']

Tags: 字符串列表tagxxxxxxxxx
2条回答

是的,使用解析器。你知道吗

但是,这会做到:

def f(x):
    lst = []
    rec = ""
    for i in x:
        if i == "<":
            if rec != "":
                lst.append(rec)
            rec = ""
        rec += i
        if i == ">":
            lst.append(rec)
            rec = ""
    return lst

它基本上记录了“<;”和“>;”之间的所有内容,并将其添加到列表中。它还记录所有“>;”和“<;”之间的间隙,以便记录诸如“xxxx”之类的内容

为此,请使用^{}

Return all non-overlapping matches of pattern in string, as a list of strings. The string is scanned left-to-right, and matches are returned in the order found. If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. Empty matches are included in the result

In [1]: a='<.tag> xxxxx<./tag> <.tag>'

In [2]: import re
In [4]: re.findall(r'<[^>]+>|\w+',a)
Out[4]: ['<.tag>', 'xxxxx', '<./tag>', '<.tag>']

In [5]: re.findall(r'<[^>]+>|[^<]+',a)
Out[5]: ['<.tag>', ' xxxxx', '<./tag>', ' ', '<.tag>']

In [17]: [i.strip() for i in re.findall(r'<[^>]+>|[^<]+',a) if not i.isspace()]
Out[17]: ['<.tag>', 'xxxxx', '<./tag>', '<.tag>']

相关问题 更多 >