正则表达式。匹配包含特殊字符或“http://”的单词

网友

1楼 · 编辑于 2024-04-24 11:27:04

不使用regex，但这可能管用？（我假设“：”和“/”是特殊字符，因此它将隐式删除URL）

def good_word(word):
    import string
    for c in word:
        if not c in string.ascii_letters:
            return False
    return True

def clean_string(str):
    return ' '.join([w for w in input.split() if good_word(w)])

print clean_string("%he#llo, my website is: http://www.url.com/abcdef123")

网友

2楼 · 编辑于 2024-04-24 11:27:04

对于您给出的示例字符串，以下正则表达式可以正常工作：

>>> a = '%he#llo, my website is: http://www.url.com/abcdef123'
>>> re.findall('(http://\S+|\S*[^\w\s]\S*)',a)
['%he#llo,', 'is:', 'http://www.url.com/abcdef123']

。。。或者您可以使用re.sub删除这些单词

>>> re.sub('(http://\S+|\S*[^\w\s]\S*)','',a)
' my website  '

|表示交替，并将匹配组中任意一侧的表达式。左边的部分匹配http://，后跟一个或多个非空格字符。右边的部分匹配零个或多个非空格字符，后跟不是单词或空格字符的任何字符，后跟零个或多个非空格字符--这样可以确保字符串至少有一个非单词字符，并且没有空格。

更新了：当然，正如其他答案暗示的那样，因为http://前缀包含一个非单词字符（/），您不需要使用它作为替代-您可以将正则表达式简化为\S*[^\w\s]\S*。然而，也许上面的示例和alternation仍然有用。

网友

3楼 · 编辑于 2024-04-24 11:27:04

您可以使用look aheads：

>>> re.findall(r"(?:\s|^)(\w+)(?=\s|$)", "Start %he#llo, my website is: http://www.url.comabcdef123 End")
['Start', 'my', 'website', 'End']

说明：

(?:\s|^)表示我们的单词以regex开头，或者以空格开头。（空间不属于这个词）。
(\w+)匹配一个单词（这是我们感兴趣的）。
(?=\s|$)表示我们的单词后跟空格或字符串结尾。（再一次，这个空间不属于这个词）。

相关问题更多 >

编程相关推荐

热门问题

热门文章

正则表达式。匹配包含特殊字符或“http://”的单词

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >