如何获取正则表达式来捕获“x美元和x美分”？

import re train = open(r"C:\Users\inigo\PycharmProjects\pythonProject\all-OANC.txt", encoding='utf8') # didn't have encoding lol # opens the files strain = train.read() # converts the files into a string train.close() #pattern = re.compile(r'\$\d[\d,.]*\b(?:\s*million\b)?(?:\s*billion\b)?') pattern2 = re.compile('\d[\d]*( dollar(s)?)(?:\s*(and )\d[\d]*( cent(s)?)\b)?') # Finds all numbers which can include commas and decimals that start with $ and if it has a million or a billion at the end #We need to find patterns so if it contains a dollar keyword afterward it will count the number matches = pattern2.findall(strain) for match in matches: print(match)

3条回答

网友

1楼 · 编辑于 2024-06-01 01:48:39

试试这个正则表达式：

\b\d+\s+dollars?(?:\s+and\s+\d+\s+cents?)?\b

Regex Demo

网友

2楼 · 编辑于 2024-06-01 01:48:39

在您的regexp中：

\d[\d]*( dollar(s)?)(?:\s*(and )\d[\d]( cent(s)?)\b)?
       ^       ^ ^ ^^     ^    ^      ^     ^ ^ ^  ^
       |       (2) ||     +(4)-+      |     (6) |  |
       +  (1)  +|                 +  -(5)-+  |
                    +       (3)      -+

这些是您可以进行子匹配的不同组的编号。您有六个组，编号在左括号的regexp中的位置之后，因此这说明，在匹配的输入字符串下，您只得到您描述的内容。如果需要数字，需要在感兴趣的子表达式中添加括号，以便在某些组中获得它们，方法如下：

(\d[\d]*)( dollar(s)?)(?:\s*(and )(\d[\d])( cent(s)?)\b)?
^       ^^       ^ ^ ^^     ^    ^^      ^^     ^ ^ ^  ^
+ (1) +|       (3) ||     +(5)-++ (6)-+|     (8) |  |
         +  (2)  +|                   +  (7) +  |
                      +       (4)       -+

（现在你有了第八组）你必须在第1组中搜索美元金额，在第6组中搜索美分金额

网友

3楼 · 编辑于 2024-06-01 01:48:39

您可以使用以下正则表达式：

'(\d+ dollars?)(\s+and\s+\d{1,2} cents?)?'

相关问题更多 >

编程相关推荐

热门问题

热门文章