用Python计算N个程序

#!/usr/bin/env python # File: n-gram.py def N_Gram(N,text): NList = [] # start with an empty list if N> 1: space = " " * (N-1) # add N - 1 spaces text = space + text + space # add both in front and back # append the slices [i:i+N] to NList for i in range( len(text) - (N - 1) ): NList.append(text[i:i+N]) return NList # return the list # test code for i in range(5): print N_Gram(i+1,"text") # more test code nList = N_Gram(7,"Here is a lot of text to print") for ngram in iter(nList): print '"' + ngram + '"'

3条回答

网友

1楼 · 编辑于 2024-05-14 13:15:39

使用NLTK（自然语言工具包）并使用函数将文本标记（拆分）到列表中，然后查找bigrams和trigrams。

import nltk
words = nltk.word_tokenize(my_text)
my_bigrams = nltk.bigrams(words)
my_trigrams = nltk.trigrams(words)

网友

2楼 · 编辑于 2024-05-14 13:15:39

假设输入是一个包含空格分隔单词的字符串，例如x = "a b c d"，则可以使用以下函数（编辑：请参阅最后一个函数以获得可能更完整的解决方案）：

def ngrams(input, n):
    input = input.split(' ')
    output = []
    for i in range(len(input)-n+1):
        output.append(input[i:i+n])
    return output

ngrams('a b c d', 2) # [['a', 'b'], ['b', 'c'], ['c', 'd']]

如果您希望将这些连接回到字符串中，可以调用如下命令：

[' '.join(x) for x in ngrams('a b c d', 2)] # ['a b', 'b c', 'c d']

最后，这并不能把事情总结成总数，所以如果您的输入是'a a a a'，那么您需要将它们计算成一个dict：

for g in (' '.join(x) for x in ngrams(input, 2)):
    grams.setdefault(g, 0)
    grams[g] += 1

把所有这些放在一起形成一个最终函数：

def ngrams(input, n):
   input = input.split(' ')
   output = {}
   for i in range(len(input)-n+1):
       g = ' '.join(input[i:i+n])
       output.setdefault(g, 0)
       output[g] += 1
    return output

ngrams('a a a a', 2) # {'a a': 3}

网友

3楼 · 编辑于 2024-05-14 13:15:39

从这个blog得到一个短的Python解决方案：

def find_ngrams(input_list, n):
  return zip(*[input_list[i:] for i in range(n)])

用法：

>>> input_list = ['all', 'this', 'happened', 'more', 'or', 'less']
>>> find_ngrams(input_list, 1)
[('all',), ('this',), ('happened',), ('more',), ('or',), ('less',)]
>>> find_ngrams(input_list, 2)
[('all', 'this'), ('this', 'happened'), ('happened', 'more'), ('more', 'or'), ('or', 'less')]
>>> find_ngrams(input_list, 3))
[('all', 'this', 'happened'), ('this', 'happened', 'more'), ('happened', 'more', 'or'), ('more', 'or', 'less')]

相关问题更多 >

编程相关推荐

热门问题

热门文章