Python中相邻句子的二元组

0 投票

1 回答

882 浏览

提问于 2025-04-17 07:48

假设我有三句话：

hello world
hello python
today is tuesday

如果我对每句话生成二元组（bigrams），结果会像这样：

[('hello', 'world')]
[('this', 'is'), ('is', 'python')]
[('today', 'is'), ('is', 'tuesday')]

那么，单独一句话的二元组和两句话连在一起的二元组有什么区别呢？比如说，hello world. hello python 是两句连在一起的句子。这两句的二元组会和我之前的输出一样吗？

生成这些的代码是：

from itertools import tee, izip

def bigrams(iterable):
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

with open("hello.txt", 'r') as f:
    for line in f:
        words = line.strip().split()
        bi = bigrams(words)
        print list(bi)

二元组自然语言处理语言模型句子分析文本挖掘词汇关系

1 个回答

但是如果我想为相邻的句子生成二元组，结果会和上面的输出一样吗？如果不一样，输出会是什么样的呢？

这要看你想要什么。如果你把二元组的内容定义为整个句子，那么结果会是这样的：

[('hello world', 'this is python'),('this is python', 'today is tuesday')]

如果你想要的二元组是以单词为单位，针对所有句子，结果会是这样的：

[('hello', 'world'), ('world', 'this'), ('this', 'is'),...]

回答于 2025-04-17 由 Python大师

分享举报

Python中相邻句子的二元组

1 个回答

撰写回答