<p>您可以使用一个合适的特殊正则表达式来实现这一点。这是我最好的尝试。我使用命名的捕获组,因为对于模式,这种复杂的、数值的组在反向引用中使用会更加混乱。在</p>
<p>首先,regexp模式:</p>
<pre><code>_pattern = r"""(?x) # enable verbose mode (which ignores whitespace and comments)
^ # start of the input
[^\d+-\.]* # prefixed junk
(?P<number> # capturing group for the whole number
(?P<sign>[+-])? # sign group (optional)
(?P<integer_part> # capturing group for the integer part
\d{1,3} # leading digits in an int with a thousands separator
(?P<sep> # capturing group for the thousands separator
[ ,.] # the allowed separator characters
)
\d{3} # exactly three digits after the separator
(?: # non-capturing group
(?P=sep) # the same separator again (a backreference)
\d{3} # exactly three more digits
)* # repeated 0 or more times
| # or
\d+ # simple integer (just digits with no separator)
)? # integer part is optional, to allow numbers like ".5"
(?P<decimal_part> # capturing group for the decimal part of the number
(?P<point> # capturing group for the decimal point
(?(sep) # conditional pattern, only tested if sep matched
(?! # a negative lookahead
(?P=sep) # backreference to the separator
)
)
[.,] # the accepted decimal point characters
)
\d+ # one or more digits after the decimal point
)? # the whole decimal part is optional
)
[^\d]* # suffixed junk
$ # end of the input
"""
</code></pre>
<p>下面是一个函数来使用它:</p>
^{pr2}$
<p>一些只有一个逗号或句点且后面正好有三个数字的数字字符串(例如<code>"1,234"</code>和<code>"1.234"</code>)是不明确的。这段代码将把它们都解析为带有一千个分隔符(<code>1234</code>)的整数,而不是浮点值(<code>1.234</code>),而不管实际使用的分隔符是什么。如果您希望这些数字有不同的结果(例如,如果您希望使用<code>1.234</code>进行浮点运算),则可以使用一个特殊情况来处理此问题。在</p>
<p>一些测试输出:</p>
<pre><code>>>> test_cases = ["2", "2.3", "2,35", "-2 000,5", "EUR 1.000,74 €",
"20,5 20,8", "20.345.32.231,50", "1.234"]
>>> for s in test_cases:
print("{!r:20}: {}".format(s, parse_number(s)))
'2' : 2
'2.3' : 2.3
'2,35' : 2.35
'-2 000,5' : -2000.5
'EUR 1.000,74 €' : 1000.74
'20,5 20,8' : None
'20.345.32.231,50' : None
'1.234' : 1234
</code></pre>