使用python将字符串中的单词替换为列表中的单词

def replace_all(): text = '000 001 002 003 ' word = ['foo', 'bar', 'that', 'these'] for a in word: y = -1 for w in text: y = y + 1 x = "00"+str(y) w = {x:a} for i, j in w.iteritems(): text = text.replace(i, j) print text

2条回答

网友

1楼 · 编辑于 2024-05-16 11:40:59

这实际上是一个非常简单的list comprehension：

>>> text = '000 001 002 003 '
>>> words = ['foo', 'bar', 'that', 'these']
>>> [words[int(item)] for item in text.split()]
['foo', 'bar', 'that', 'these']

编辑：如果需要保留其他值，可以满足以下要求：

def get(seq, item):
    try:
        return seq[int(item)]
    except ValueError:
        return item

然后简单地使用像[get(words, item) for item in text.split()]这样的东西-当然，如果字符串中有其他数字可能会被意外替换，那么get()中可能需要更多的测试。（编辑结束）

我们要做的是将文本分割成单独的数字，然后将它们转换成整数，并使用它们来索引您给出的查找单词的列表。

至于为什么你的代码不能工作，主要的问题是你在字符串上循环，这会给你字符，而不是单词。不过，这不是解决问题的好办法。

还值得一提的是，当循环遍历值并希望索引与它们一起使用时，应该使用the ^{} builtin，而不是使用计数变量。

例如：代替：

y = -1
for w in text:
    y = y + 1
    ...

使用：

for y, w in enumerate(text):
    ...

这是更可读和Python。

现有代码的另一个特点是：

w = {x:a}      
for i, j in w.iteritems():
    text = text.replace(i, j)

如果你仔细想想，可以简化为：

text = text.replace(x, a)

你将w设置为一个项的字典，然后循环它，但是你知道它将永远只包含一个项。

更接近于您的方法的解决方案如下：

words_dict = {"{0:03d}".format(index): value for index, value in enumerate(words)}
for key, value in words_dict.items():
    text = test.replace(key, value)

我们创建一个从零填充数字字符串（使用^{}）到值的字典，然后替换每个项。注意，当您使用2.x时，您需要dict.iteritems()，如果您是2.7之前的版本，请在元组生成器上使用内置的dict()，因为dict理解不存在。

网友

2楼 · 编辑于 2024-05-16 11:40:59

在处理文本时，显然必须考虑regex。

import re

text = text = ('<p><span class="newStyle0" '
               'style="left: 291px; '
               'top: 258px">000</span></p> <p>'
               '<span class="newStyle1" '
               'style="left: 85px; '
               'top: 200px">001</span></p> <p>'
               '<span class="newStyle2" '
               'style="left: 580px; '
               'top: 400px; width: 167px; '
               'height: 97px">002</span></p> <p>'
               '<span class="newStyle3" '
               'style="left: 375px; top: 165px">'
               '003</span></p>')

words = ['XXX-%04d-YYY' % a for a in xrange(1000)]

regx = re.compile('(?<=>)\d+(?=</span>)')

def gv(m,words = words):
    return words[int(m.group())]

print regx.sub(gv,text)

相关问题更多 >

编程相关推荐

热门问题

热门文章