如何在Python字符串中删除包含子串的单词？

0 投票

2 回答

8121 浏览

提问于 2025-04-18 01:48

我在使用Twitter的API时，得到了几个包含链接的字符串（推文），这些链接都是以'http://'开头的。

我想把这些链接去掉，也就是说，我想删除整个单词。

假设我有：

'Mi grupo favorito de CRIMINALISTICA. Ultima clase de cuatrimestre http://t.co/Ad2oWDNd4u'

我想得到：

'Mi grupo favorito de CRIMINALISTICA. Ultima clase de cuatrimestre'

这些链接可以出现在字符串的任何地方。

正则表达式字符串处理文本清理 API数据处理子串匹配

2 个回答

你可以这样做：

s[:s.index('http://')-1]

如果它不总是在最后出现，你可以这样做：

your_list = s.split()
i = 0
while i < len(your_list):
    if your_list[i].startswith('http://'):
        del your_list[i]
    else:
        i+=1
s = ' '.join(your_list)

回答于 2025-04-18 由 Python大师

分享举报

你可以使用 re.sub() 这个函数来把所有的链接替换成空字符串，也就是把它们删掉：

>>> import re
>>> pattern = re.compile('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+')
>>> s = 'Mi grupo favorito de CRIMINALISTICA. Ultima clase de cuatrimestre http://t.co/Ad2oWDNd4u'
>>> pattern.sub('', s)
'Mi grupo favorito de CRIMINALISTICA. Ultima clase de cuatrimestre '

这个方法会把字符串中所有的链接都替换掉，不管它们在什么地方：

>>> s = "I've used google https://google.com and found a regular expression pattern to find links here https://stackoverflow.com/questions/6883049/regex-to-find-urls-in-string-in-python"
>>> pattern.sub('', s)
"I've used google  and found a regular expression pattern to find links here "

这个正则表达式是从这个讨论中得到的：

在Python中找到字符串里的网址的正则表达式

回答于 2025-04-18 由 Python大师

分享举报

如何在Python字符串中删除包含子串的单词？

2 个回答

撰写回答