python3中的Regex:匹配数字或可选句点之后但在可选comm之前的所有内容

2024-05-23 22:26:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我试着从配方中退回原料,没有任何测量或指导。成分列表如下:

['1  medium tomato, cut into 8 wedges',
 '4  c. torn mixed salad greens',
 '1/2  small red onion, sliced and separated into rings',
 '1/4  small cucumber, sliced',
 '1/4  c. sliced pitted ripe olives',
 '2  Tbsp. reduced-calorie Italian salad dressing',
 '2  Tbsp. lemon juice',
 '1  Tbsp. water',
 '1/2  tsp. dried mint, crushed',
 '1/4  c. crumbled Feta cheese or 2 Tbsp. crumbled Blue cheese']

我想返回以下列表:

^{pr2}$

我发现的最接近的模式是:

pattern = '[\s\d\.]* ([^\,]+).*'

但在测试中:

for ing in ingredients:
    print(re.findall(pattern, ing))

每个度量缩写后面的句点也会返回,例如:

['c. torn mixed salad greens']

同时

pattern = '(?<=\. )[^.]*$'

无法捕获没有句点的实例,如果两者都出现,则捕获逗号,即:

[]
['torn mixed salad greens']
[]
[]
['sliced pitted ripe olives']
['reduced-calorie Italian salad dressing']
['lemon juice']
['water']
['dried mint, crushed']
['crumbled Blue cheese']

提前谢谢你!在


Tags: 列表smallpatternripeintocheesemixedsalad
3条回答

说明

我建议使用以下正则表达式来查找和替换您不感兴趣的子字符串。通过说明度量单位,这也将处理不缩写的度量单位。在

\s*(?:(?:(?:[0-9]\s*)?[0-9]+\/)?[0-9]+\s*(?:(?:c\.|cups?|tsp\.|teaspoon|tbsp\.|tablespoon)\s*)?)|,.*|.*\bor\b

Regular expression visualization

替换为:

示例

现场演示

显示如何匹配

https://regex101.com/r/qV5iR8/3

示例字符串

注意,最后一行有一个or分隔的双成分,根据他们希望消除第一个成分的OP。在

1  medium tomato, cut into 8 wedges
4  c. torn mixed salad greens
1/2  small red onion, sliced and separated into rings
1/4  small cucumber, sliced
1 1/4  c. sliced pitted ripe olives
2  Tbsp. reduced-calorie Italian salad dressing
2  Tbsp. lemon juice
1  Tbsp. water
1/2  tsp. dried mint, crushed
1/4  c. crumbled Feta cheese or 2 Tbsp. crumbled Blue cheese

更换后

^{pr2}$

解释

NODE                     EXPLANATION
                                   
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
                                   
  (?:                      group, but do not capture:
                                   
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
                                   
      (?:                      group, but do not capture (optional
                               (matching the most amount possible)):
                                   
        [0-9]                    any character of: '0' to '9'
                                   
        \s*                      whitespace (\n, \r, \t, \f, and " ")
                                 (0 or more times (matching the most
                                 amount possible))
                                   
      )?                       end of grouping
                                   
      [0-9]+                   any character of: '0' to '9' (1 or
                               more times (matching the most amount
                               possible))
                                   
      \/                       '/'
                                   
    )?                       end of grouping
                                   
    [0-9]+                   any character of: '0' to '9' (1 or more
                             times (matching the most amount
                             possible))
                                   
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
                                   
    (?:                      group, but do not capture (optional
                             (matching the most amount possible)):
                                   
      (?:                      group, but do not capture:
                                   
        c                        'c'
                                   
        \.                       '.'
                                   
       |                        OR
                                   
        cup                      'cup'
                                   
        s?                       's' (optional (matching the most
                                 amount possible))
                                   
       |                        OR
                                   
        tsp                      'tsp'
                                   
        \.                       '.'
                                   
       |                        OR
                                   
        teaspoon                 'teaspoon'
                                   
       |                        OR
                                   
        tbsp                     'tbsp'
                                   
        \.                       '.'
                                   
       |                        OR
                                   
        tablespoon               'tablespoon'
                                   
      )                        end of grouping
                                   
      \s*                      whitespace (\n, \r, \t, \f, and " ")
                               (0 or more times (matching the most
                               amount possible))
                                   
    )?                       end of grouping
                                   
  )                        end of grouping
                                   
 |                        OR
                                   
  ,                        ','
                                   
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
                                   
 |                        OR
                                   
  .*                       any character except \n (0 or more times
                           (matching the most amount possible))
                                   
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
                                   
  or                       'or'
                                   
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
                                   

您可以使用此模式:

for ing in ingredients:
    print(re.search(r'[a-z][^.,]*(?![^,])(?i)', ing).group())

图案细节:

^{pr2}$

问题是你把数字和点配对。在

\s\d*\.?

应能正确匹配数字(带或不带点)

相关问题 更多 >