从python中的一个大字符串中读取从左到右到<=的所有值

2024-06-17 15:16:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个大字符串,希望读取所有左、右至<;=例如{\nnode [shape=box] ;\n0 [label="X[2] <= 17055.5\\ngini = 0.0454\\nsamples = 43\\nvalue = [42, 1]"] ;\n1 [label="gini = 0.0\\nsamples = 1\\nvalue = [0, 1]"] ;\n0 -> 1 [labeldistance=2.5, labelangle=45, headlabel="True"] ;\n2 [label="gini = 0.0\\nsamples = 42\\nvalue = [42, 0]"] ;\n0 -> 2 [labeldistance=2.5, labelangle=-45, headlabel="False"] [label]="X[0] <= 5.41" ;\n}这里我应该有两组输出,因为<;=出现两次

python正则表达式python-3.x


Tags: 字符串ltboxlabelshapeginin0n1
2条回答

如果我理解正确,您希望<=周围的单词由"\分隔。如果是:

re.findall(r'["\\]([^"\\]+<=[^"\\]+)(?=["\\])', str_)
  • ["\\]匹配"\

  • 捕获的组([^"\\]+<=[^"\\]+)匹配一个或多个非"\的字符,后跟<=,然后再匹配一个或多个非"\的字符

  • 带正向前瞻的零(?=["\\]),确保捕获的组后面紧跟"\

示例:

In [171]: str_ = '{\nnode [shape=box] ;\n0 [label="X[2] <= 17055.5\\ngini = 0.0454\\nsamples = 43\\nvalue = [42, 1]"] ;\n1 [label="gini = 0.0\\nsamples = 1\\nvalue = [0, 1]"] ;\n0 -> 1 [labeldistance=2.5,
     ...:  labelangle=45, headlabel="True"] ;\n2 [label="gini = 0.0\\nsamples = 42\\nvalue = [42, 0]"] ;\n0 -> 2 [labeldistance=2.5, labelangle=-45, headlabel="False"] [label]="X[0] <= 5.41" ;\n}'

In [172]: re.findall(r'["\\]([^"\\]+<=[^"\\]+)(?=["\\])', str_)
Out[172]: ['X[2] <= 17055.5', 'X[0] <= 5.41']

你可以试试这个:

[^\"<]+<=\s*[0-9.]*

它可能会更快,但它做的工作

Online demo

相关问题 更多 >