Python正则表达式:包含数字的子内容,数字可以有千位分隔符和小数

2024-06-06 19:26:59 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下文本。我想收集所有包含数字的子内容(从逗号或句点到逗号或句点)。我已成功创建了以下正则表达式,用于收集数字及其后面的部分,但由于我的数字可以包含逗号或句点,因此我不知道如何才能抓住它前面的单词。我想用粗体显示包含部分的句子:

In connection with the consummation of this offering, we will enter into a forward purchase agreement with OrION Capital Structure Solutions UK Limited, or OrION, an affiliate of our sponsor, pursuant to which OrION will commit that it will purchase from us 10,000,000 forward purchase units, or at its option up to an aggregate maximum of 30,000,000 forward purchase units, each consisting of one Class A ordinary share,or a forward purchase share, and one-third of one warrant to purchase one Class A ordinary share, or a forward purchase warrant, for $10.00 per unit, or an aggregate amount of $100,000,000, or at OrION’s option up to an aggregate amount of $300,000,000, in a private placement that will close concurrently with the closing of our initial business combination.

我想收集的是:

["pursuant to which OrION will commit that it will purchase from us 10,000,000 forward purchase units",
"or at its option up to an aggregate maximum of 30,000,000 forward purchase units", "for $10.00 per unit", "or an aggregate amount of $100,000,000", "or at OrION’s option up to an aggregate amount of $300,000,000"]

我编写的正则表达式当前获取数字和后面的部分,直到下一个逗号或句点

[0-9]{1,2}([,.][0-9]{1,2})?.*?[\.,]

我如何收集句子的一部分(以句点或逗号开始),以及可以包含小数点或千位分隔符的数字,然后收集句子的一部分,直到下一个逗号或句点

编辑:anubhava和bb1都给出了正确的解决方案。阿努巴瓦完全按照我的要求解决了这个问题,这是正确的答案。然而,bb1为必然会发生的事情做准备(我没有想到),所以最后我使用了他的答案,但将anubhava标记为给出解决方案的人,因为这正是我所问的解决方案

编辑2:anubhava更新了他的答案,因此它解决了与bb1-s相同的问题


Tags: oroftoan数字purchasewillat
2条回答

anubhava的解决方案非常有效,只要字符串段中有一个数字用逗号或句点括起来,但不包括有多个数字的情况,例如

"Therefore, this costs $10,000 and that costs $20,000 per item."

如果有帮助,这里有一个处理此类情况的版本:

(?<=[,.])(?:[^,.]*?\d+(?:[,.]\d+)*[^,.]*?)+(?=[,.])

您可以将此正则表达式与环顾断言一起使用:

(?<=[.,] )(?:[^,.]*?\d+(?:[.,]\d+)*)+[^.,]*(?=[,.])

RegEx Demo

正则表达式详细信息:

  • (?<=[.,] ):Lookbehind断言,断言当前位置前有逗号或点后跟空格
  • (?::启动一个非捕获组
    • [^,.]*?:匹配0个或多个非,.(惰性)字符
    • \d+(?:[.,]\d+)*:匹配可能包含.,的数字
  • )+:结束非捕获组+将该组重复1次以上
  • [^.,]*:匹配0个或更多不是,.的字符
  • (?=[,.]):前向断言,断言当前位置后有逗号或点

相关问题 更多 >