禁止的python正则表达式 $[a-zA-Z ][0-9a-zA-Z ]*

0 投票

3 回答

5930 浏览

提问于 2025-04-18 01:36

我在找一个Python中的正则表达式，它可以在字符串中找到：

$[a-zA-Z_][0-9a-zA-Z_]*

这些可以有很多个，并且它们可以用空格（\s）分开。

这听起来很简单，但我还需要确保整个字符串中不能有任何不符合这个模式的东西。（空字符串也是可以的）

我来给你一些例子：

$x$y0123 => OK, gives me [$x, $y0123]
$ => BAD (only $)
"" or "  \t" => OK, gives me []    
$x      @hi => BAD, cause @hi, does not match the pattern

可以有多个正则表达式，不一定只用一个。

regex = re.compile("(\$[a-zA-Z_][0-9a-zA-Z_]*)") regex.findall(string)

如果我不需要检查那些东西，这样就可以了。

正则表达式字符串匹配文本解析空格处理数据清洗模式验证

3 个回答

试试这个：

import re
s1 = '$x$y0123 $_xyz1$B0dR_'
s2 = '$x$y0123 $_xyz1$B0dR_ @1'
s3 = '$'
s4 = '   \t'
s5 = ''

def process(s, pattern):
    '''Find substrings in s that match pattern

    if string is not completely composed of substings that match pattern
    raises AttributeError

    s --> str
    pattern --> str
    returns list
    '''
    rex = re.compile(pattern)
    matches = list()
    while s:
##        print '*'*8
##        print s1
        m = rex.match(s)
        matches.append(m)
##        print '\t', m.group(), m.span(), m.endpos    
        s = s[m.end():]
    return matches

pattern = '\$[a-zA-Z_][0-9a-zA-Z_]*'
for s in [s1, s2, s3, s4, s5]:
    print '*'*8
    # remove whitespace
    s = re.sub('\s', '', s)
    if not s:
        print 'empty string'
        continue
    try:
        matches = process(s, pattern)
    except AttributeError:
        print 'this string has bad stuff in it'
        print s
        continue
    print '\n'.join(m.group() for m in matches)

>>> 
********
$x
$y0123
$_xyz1
$B0dR_
********
this string has bad stuff in it
$x$y0123$_xyz1$B0dR_@1
********
this string has bad stuff in it
$
********
empty string
********
empty string
>>>

回答于 2025-04-18 由 Python大师

分享举报

要检查整个字符串，最好使用 re.match 函数，而不是 re.findall。你可以使用一个模式，这个模式也允许空格，写法是 ^((\$[a-zA-Z_][0-9a-zA-Z_])|(\s))*$。

回答于 2025-04-18 由 Python大师

分享举报

嗯，我不太确定你想要做什么，但也许你需要两个正则表达式：第一个用来检查格式是否正确，第二个用来提取匹配的内容。

import re
stuff = ["$x$y0123", "$", "", "  \t", "$x      @hi"]

p1 = re.compile(r'(?:\$[A-Z_]\w*|\s)*$', re.IGNORECASE)
p2 = re.compile(r'\$[A-Z_]\w*|\s+', re.IGNORECASE)

for thing in stuff:
    if p1.match(thing):
        print(p2.findall(thing))

将会打印：

['$x', '$y0123']
[]
['  \t']

ideone演示

回答于 2025-04-18 由 Python大师

分享举报

禁止的python正则表达式 $[a-zA-Z ][0-9a-zA-Z ]*

3 个回答

撰写回答