如何从字符串中去除标点符号并在同一索引处重新添加?
我想让程序这样做:假设我的字符串是 "This is 'Cambridge University' for example." 它会保留每个单词的首字母和尾字母,然后把中间的字母打乱,但前提是这个单词的长度要大于3个字母。我的问题是,当单词前面或后面有标点符号时,程序会错误地打乱这些单词。我需要它在打乱时,确保标点符号保持在正确的位置,同时保留单词的首尾字母,并把中间的字母打乱,如果有标点符号的话,还要把它加到最后。有没有什么好的想法?
def scramble_word(word_str):
char = ".,!?';:"
import random
if len(word_str) <= 3:
return word_str + ' '
else:
word_str = word_str.strip(char)
word_str = list(word_str)
scramble = word_str[1:-1]
random.shuffle(scramble)
scramble = ''.join(scramble)
word_str = ''.join(word_str)
new_word = word_str[0] + scramble + word_str[-1]
return new_word + ' '
2 个回答
1
用正则表达式来做这件事非常简单:
import re
import random
s = ('Pitcairn Islands, Saint Helena, '
'Ascension and Tristan da Cunha, '
'Saint Kitts and Nevis, '
'Saint Vincent and the Grenadines, Singapore')
reg = re.compile('(?<=[a-zA-Z])[a-zA-Z]{2,}(?=[a-zA-Z])')
def ripl(m):
g = list(m.group())
random.shuffle(g)
return ''.join(g)
print reg.sub(ripl,s)
结果
Piictran Islands, Sanit Heelna, Asnioecsn and Tiastrn da Cunha, Sniat Ktits and Neivs, Snait Vnnceit and the Giearndens, Snoiaprge
6
使用正则表达式:
import random
import re
random.seed(1234) #remove this in production, just for replication of my results
def shuffle_word(m):
word = m.group()
inner = ''.join(random.sample(word[1:-1], len(word) - 2))
return '%s%s%s' % (word[0], inner, word[-1])
s = """This is 'Cambridge University' for example."""
print re.sub(r'\b\w{3}\w+\b', shuffle_word, s)
这段代码的输出是
Tihs is 'Cadibrgme Uinrtvsiey' for exlampe.
re.sub
这个函数让你可以传入一个函数(这个函数接受一个正则匹配对象),而不是直接传入一个替换的字符串。
编辑 - 不使用正则表达式
from StringIO import StringIO
def shuffle_word(m):
inner = ''.join(random.sample(m[1:-1], len(m) - 2))
return '%s%s%s' % (m[0], inner, m[-1])
def scramble(text)
sio = StringIO(text)
accum = []
start = None
while sio.tell() < sio.len:
char = sio.read(1)
if start is None:
if char.isalnum():
start = sio.tell() - 1
else:
accum.append(char)
elif not char.isalnum():
end = sio.tell() - 1
sio.seek(start)
accum.append(shuffle_word(sio.read(end - start)))
print accum[-1]
start = None
else:
if start is not None:
sio.seek(start)
word = sio.read()
if len(word) > 3:
accum.append(shuffle_word(sio.read()))
else:
accum.append(word)
return ''.join(accum)
s = """This is 'Cambridge University' for example."""
print scramble(s)