列表中每个单词的平均字符数

0 投票

1 回答

980 浏览

提问于 2025-05-01 01:45

我正在尝试计算一个列表中每个单词的平均字符数，我使用了一些定义和一个辅助函数 clean_up。

以下是我的定义：

一个“标记”是你通过对文件中的一行调用 split() 得到的字符串。
一个“单词”是一个非空的标记，并且它不能完全由标点符号组成。
一个“句子”是由字符组成的序列，它以 !?. 或文件结束符（EOF）结束，但不包括这些字符。句子两端不能有空格，并且不能是空字符串。

def clean_up(s):
    """ (str) -> str

    Return a new string based on s in which all letters have been
    converted to lowercase and punctuation characters have been stripped 
    from both ends. Inner punctuation is left untouched. 

    >>> clean_up('Happy Birthday!!!')
    'happy birthday'
    >>> clean_up("-> It's on your left-hand side.")
    " it's on your left-hand side"
    """

    punctuation = """!"',;:.-?)([]<>*#\n\t\r"""
    result = s.lower().strip(punctuation)
    return result

我的代码是：

def avg_word_length(text):
    """ (list of str) -> float

    Precondition: text is non-empty. Each str in text ends with \n and
    text contains at least one word.

    Return the average length of all words in text. 

    >>> text = ['James Fennimore Cooper\n', 'Peter, Paul and Mary\n']
    >>> avg_word_length(text):
    5.142857142857143 
    """

    a = ''
    for i in range(len(text)):
        a = a + clean_up(text[i])
        words = a.split()
    for word in words:
        average = sum(len(word) for word in words)/len(words)
    return average

我得到的结果是 6.16666...。
我使用的是 Python 3。

暂无标签

1 个回答

你的代码里有两个明显的逻辑错误。

首先，在 clean_up 函数中，你只是在字符串的开头和结尾去掉分隔符，但字符串中间的连续分隔符却没有处理。而且，你没有在去掉的分隔符上进行分割；这样一来，像 "Peter," 这样的词就会多出一个字符。

其次，你在去掉分隔符后拼接行，使用了 a = a + clean_up(text[i])。这就导致你确保了单词太长且数量太少，因为一行的最后一个词和下一行的第一个词会合并在一起；在这种情况下，你会得到 "CooperPeter," 作为一个词。

如果你在第二个循环之前打印 words，这两个问题就会很明显（而且考虑到 sum() 调用中的生成器表达式，第二个循环其实没有必要）。

个人来说，我可能会使用 re 模块来找到具有单一一致定义的词（比如 r"\w+"），然后统计它们的长度，而不是收集一个包含它们内容的字符串。

回答于 2025-05-01 由 Python大师

分享举报

列表中每个单词的平均字符数

1 个回答

撰写回答