我需要创建一个函数,从文本生成一个列表:
text = '^to[by, from] all ^appearances[appearance]'
list = ['to all appearances', 'to all appearance', 'by all appearances',
'by all appearance', 'from all appearances', 'from all appearance']
也就是说,括号内的值应该替换前面的单词,它紧跟在^之后。我想有五个参数的函数,你可以看到下面。。。你知道吗
我的代码(不起作用)
def addSubstitution(buf, substitutions, val1='[', val2=']', dsym=',', start_p="^"):
for i in range(1, len(buf), 2):
buff = []
buff.extend(buf)
if re.search('''[^{2}]+[{0}][^{1}{0}]+?[{1}]'''.format(val1, val2, start_p, buff[i]):
substrs = re.split('['+val1+']'+'|'+'['+val2+']'+'|'+dsym, buff[i])
for substr in substrs:
if substr:
buff[i] = substr
addSubstitution(buff, substitutions, val1, val2, dsym, start_p)
return
substitutions.add(''.join(buf))
pass
def getSubstitution(text, val1='[', val2=']', dsym=',', start_p="^"):
pattern = '''[^{2}]+[{0}][^{1}{0}]+?[{1}]'''.format(val1, val2, start_p)
texts = re.split(pattern,text)
opttexts = re.findall(pattern,text)
buff = []
p = iter(texts)
t = iter(opttexts)
buf = []
while True:
try:
buf.append(next(p))
buf.append(next(t))
except StopIteration:
break
substitutions = set()
addSubstitution(buf, substitutions, val1, val2, dsym, start_p)
substitutions = list(substitutions)
substitutions.sort(key=len)
return substitutions
一种方法是这样的(我跳过了字符串操作代码):
步骤1:标记化
text
,如下所示:第二步:准备一个我们需要笛卡尔积的所有单词的列表(以^开头的单词)。你知道吗
步骤3:使用
itertools.product(...)
执行笛卡尔积:相关问题 更多 >
编程相关推荐