计算数据帧中特定单词的出现次数

网友

1楼 · 编辑于 2024-04-24 05:13:53

易于使用的defaultdict或Counter来自collections：

words = ['uno', 'dos', 'one', 'two', 'tres', 'quatro', 'yes', 'wooly', 'bully', 'watch', 'watch', 'come', 'come', 'watch', 'git', 'matty', 'told', 'hattie', 'thing', 'saw', 'two', 'big', 'horns', 'wooly', 'jaw', 'wooly', 'bully', 'wooly', 'bully', 'yes', 'drive', 'wooly', 'bully', 'wooly', 'bully', 'wooly', 'bully', 'hattie', 'told', 'matty', 'lets', 'dont', 'take', 'chance', 'lets', 'lseven', 'come', 'learn', 'dance', 'wooly', 'bully', 'wooly', 'bully', 'wooly', 'bully', 'wooly', 'bully', 'wooly', 'bully', 'watch', 'watch', 'watch', 'watch', 'yeah', 'yeah', 'drive', 'drive', 'drive', 'matty', 'told', 'hattie', 'thats', 'thing', 'get', 'someone', 'really', 'pull', 'wool', 'wooly', 'bully', 'wooly', 'bully', 'wooly', 'bully', 'wooly', 'bully', 'wooly', 'bully', 'watch', 'watch', 'come', 'got', 'got']


from collections import defaultdict
dict_count = defaultdict(int)
for item in words:
    dict_count[item] += 1

或：

from collections import Counter
counts = Counter(words)

网友

2楼 · 编辑于 2024-04-24 05:13:53

要计算子字符串在字符串中出现的次数，可以执行以下操作

string.count(substring)

因此，您可以将此函数应用于包含字符串的列：

string_occurrences = df.Token.apply(lambda x: sum([x.count(substring) for substing in ['wooly', 'girl']])

然后你只需要把计数加起来

total_occurrences = string_occurrences.sum()

网友

3楼 · 编辑于 2024-04-24 05:13:53

Counter的返回是一个字典

df["Count"] = (
    df['Token'].str.split()
    .apply(Counter)
    .apply(lambda counts: sum([counts[word] for word in words]))
)

若Token列中的值已经是一个列表，则不需要使用str.split()

df["Count"] = (
    df['Token']
    .apply(Counter)
    .apply(lambda counts: sum([counts[word] for word in words]))
)

相关问题更多 >

编程相关推荐

热门问题

热门文章

计算数据帧中特定单词的出现次数

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >