将n-grams合并或反转为单个字符串

3 投票

3 回答

1547 浏览

提问于 2025-04-17 22:27

我想把下面的二元组合并成一个字符串，应该怎么做呢？

_bigrams=['the school', 'school boy', 'boy is', 'is reading']
_split=(' '.join(_bigrams)).split()
_newstr=[]
_filter=[_newstr.append(x) for x in _split if x not in _newstr]
_newstr=' '.join(_newstr)
print _newstr

输出：'the school boy is reading'....这是我想要的结果，但这个方法太复杂了，而且在处理我这么大数据量的时候效率不高。其次，这个方法不支持最终字符串中出现重复的词，比如'the school boy is reading, is he?'。在这种情况下，最终字符串中只允许出现一个'is'。

有没有什么建议可以让这个过程更简单有效呢？谢谢。

字符串处理数据效率文本合并词频分析 n-grams

3 个回答

这样做可以吗？它只是取了第一个单词到最后一个条目之间的内容。

_bigrams=['the school', 'school boy', 'boy is', 'is reading']

clause = [a.split()[0] if a != _bigrams[-1] else a for a in _bigrams]

print ' '.join(clause)

输出结果

the school boy is reading

不过，从性能来看，Amber的方案可能是个不错的选择。

回答于 2025-04-17 由 Python大师

分享举报

如果你真的想要一个一行代码的解决方案，可以试试下面这个：

' '.join(val.split()[0] for val in (_bigrams)) + ' ' +  _bigrams[-1].split()[-1]

回答于 2025-04-17 由 Python大师

分享举报

# Multi-for generator expression allows us to create a flat iterable of words
all_words = (word for bigram in _bigrams for word in bigram.split())

def no_runs_of_words(words):
    """Takes an iterable of words and returns one with any runs condensed."""
    prev_word = None
    for word in words:
        if word != prev_word:
            yield word
        prev_word = word

final_string = ' '.join(no_runs_of_words(all_words))

这个方法利用了生成器的特性，能够按需计算，而不是一次性把所有的单词都放在内存里，直到生成最终的字符串。

回答于 2025-04-17 由 Python大师

分享举报

将n-grams合并或反转为单个字符串

3 个回答

撰写回答