使用字典更改字符串中的单词。python

def convert_message(msg, conversion): msg = msg.translate({ord(i): None for i in ".,"}) tokens = msg.strip().split(" ") for x in msg: if x in keys (conversion): return " ".join(tokens)

1条回答

网友

1楼 · 发布于 2024-05-26 11:12:46

不是很简单：

translations = {'Drive': 'Dr'}

for index, token in enumerate(tokens):
    if token in conversion:
        tokens[index] = conversion[token]

return ' '.join(tokens)

但是，这对"Obstruction on Cowlishaw Street."这样的句子不起作用，因为现在的标记是Street.。也许您应该使用带^{}的正则表达式：

^{pr2}$

在这里，re.sub查找1个或多个连续的（+）字母数字字符（\w）；对于每个这样的正则表达式match调用给定函数，将匹配作为参数；匹配的单词可以用match.group(0)检索。函数应该返回给定匹配项的替换项-在这里，如果在字典中找到该单词，则返回该值，否则返回原始值。在

因此：

>>> msg = "Cowlishaw Street &amp; Athllon Drive, Greenway now free of obstruction."
>>> convert_message(msg, {'Drive': 'Dr', 'Street': 'St'})
'Cowlishaw St &amp; Athllon Dr, Greenway now free of obstruction.'

至于&，在python3.4+上，您应该使用^{}来解码HTML实体：

>>> import html
>>> html.unescape('Cowlishaw Street &amp; Athllon Drive, Greenway now free of obstruction.')
'Cowlishaw Street & Athllon Drive, Greenway now free of obstruction.'

这将处理所有已知的HTML实体。对于早期的python版本，您可以看到alternatives on this question。在

正则表达式与&字符不匹配；如果您还想替换它，我们可以使用正则表达式\w+|.，这意味着：“字母数字字符的任何连续运行，或者不在该运行中的任何单个字符”：

import re
import html


def convert_message(msg, conversion):
    msg = html.unescape(msg)

    def translate(match):
        word = match.group(0)
        if word in conversion:
            return conversion[word]
        return word

    return re.sub(r'\w+|.', translate, msg)

那你就可以了

>>> msg = 'Cowlishaw Street &amp; Athllon Drive, Greenway now free of obstruction.'
>>> convert_message(msg, {'Drive': 'Dr', '&': 'and', 
                          'Street': 'St', '.': '', ',': ''})
'Cowlishaw St and Athllon Dr Greenway now free of obstruction'

相关问题更多 >

编程相关推荐

热门问题

热门文章