使用MapReduce统计文本文件中单词辅音字母的频率

0 投票

3 回答

1439 浏览

提问于 2025-04-18 10:05

我需要一点帮助，想用Python代码来计算一个单词中辅音字母的出现频率。假设有以下的输入示例：

"There is no new thing under the sun."

那么，所需的输出结果应该是：

因为有2个单词含有1个辅音，3个单词含有2个辅音，2个单词含有3个辅音，还有1个单词含有4个辅音。

下面的代码做的事情类似，不过它不是计算辅音，而是计算文本文件中整个单词的出现频率。我知道只需要稍微改动一下，就能更深入地分析单词（我觉得）。

def freqCounter(file1, file2):
    freq_dict = {}
    dict_static = {2:0, 3:0, 5:0}
    # get rid of punctuation
    punctuation = re.compile(r'[.?!,"\':;]') # use re.compile() function to convert string into a RegexObject. 
    try:
        with open(file1, "r") as infile, open(file2, "r") as infile2: # open two files at once
            text1 = infile.read()   # read the file
            text2 = infile2.read()
            joined = " ".join((text1, text2)) 
            for word in joined.lower().split(): 
                #remove punctuation mark
                word = punctuation.sub("", word)
                #print word
                l = len(word) # assign l tp be the word's length
                # if corresponding word's length not found in dict
                if l not in freq_dict:
                    freq_dict[l] = 0 # assign the dict key (the length of word) to value = 0
                freq_dict[l] += 1 # otherwise, increase the value by 1
    except IOError as e:     # exception catch for error while reading the file
        print 'Operation failed: %s' % e.strerror
    return freq_dict # return the dictionary

任何帮助都将非常感谢！

数据处理统计计算文本分析 mapreduce 编程帮助词频分析辅音频率

3 个回答

一个简单的解决方案

def freqCounter(_str):
    _txt=_str.split()
    freq_dict={}
    for word in _txt:
        c=0
        for letter in word:
           if letter not in "aeiou.,:;!?[]\"`()'":
               c+=1
        freq_dict[c]=freq_dict.get(c,0)+ 1
    return freq_dict

txt = "There is no new thing under the sun."
table=freqCounter(txt)
for k in table:
    print( k, ":", table[k])

回答于 2025-04-18 由 Python大师

分享举报

我会尝试一种更简单的方法：

from collections import Counter
words = 'There is no new thing under the sun.'
words = words.replace('a', '').replace('e', '').replace('i', '').replace('o', '').replace('u', '')  # you are welcome to replace this with a smart regex

# Now words have no more vowels i.e. only consonants 
word_lengths = map(len, words.split(' '))
c = Counter(word_lengths)

freq_dict = dict(Counter(c))

回答于 2025-04-18 由 Python大师

分享举报

这个怎么样？

with open('conts.txt', 'w') as fh:
    fh.write('oh my god becky look at her butt it is soooo big')

consonants = "bcdfghjklmnpqrstvwxyz"
def count_cons(_file):
    results = {}
    with open(_file, 'r') as fh:
        for line in fh:
            for word in line.split(' '):
                conts = sum([1 if letter in consonants else 0 for letter in word])
                if conts in results:
                    results[conts] += 1
                else:
                    results[conts] = 1
    return results

print count_cons('conts.txt')

结果没看到

{1: 5, 2: 5, 3: 1, 4: 1}
[Finished in 0.0s]

回答于 2025-04-18 由 Python大师

分享举报

使用MapReduce统计文本文件中单词辅音字母的频率

3 个回答

撰写回答