给定一个字符串列表和一个列表，如何返回字数？

网友

1楼 · 编辑于 2024-06-02 07:57:48

我宁愿用正则表达式。首先，因为需要匹配整个单词，这与其他字符串搜索方法比较复杂。而且，即使它看起来像火箭筒，它通常是非常有效的。你知道吗

首先从list_2生成正则表达式，然后使用它搜索list_1的句子。正则表达式是这样构造的："(\bword1\b|\bword2\b|...)"，意思是“整字1或整字2或…”。\b意思是在单词的开头或结尾匹配。你知道吗

我假设您想要list_1的每个子列表的结果，而不是每个子列表的每个句子的结果。你知道吗

_regex = re.compile(r"(\b{}\b)".format(r"\b|\b".join(list_2)))
word_counts = [ 
    sum(
        sum(1 for occurence in _regex.findall(sentence))
        for sentence in sublist
    ) for sublist in list_1
]

Here you can find a whole sample code通过与普通字符串搜索的性能比较，知道匹配整个单词需要更多的工作，因此效率更低。你知道吗

网友

2楼 · 编辑于 2024-06-02 07:57:48

诀窍是使用split（）方法和列表理解。如果仅使用空格分隔：

list_1 = ["the guy was unable to play football but he was able to play tennis", "That was absolute cool", "This is implicit living"]

list_2 =['unable', 'unquestioning', 'implicit','living', 'relative', 'comparative']

print([sum(sum(1 for j in list_2 if j in i.split()) for i in k for k) inlist_1])

但是，如果要使用所有非字母数字进行标记化，则应使用re：

import re

list_1 = ["the guy was unable to play football,but he was able to play tennis", "That was absolute cool", "This is implicit living"]
list_2 =['unable', 'unquestioning', 'implicit','living', 'relative', 'comparative']

print(sum([sum(1 for j in list_2 if re.split("\W",i)) for i in k) for k in list_1])

\W字符集都是非字母数字的。你知道吗

网友

3楼 · 编辑于 2024-06-02 07:57:48

使用带有列表理解的内置函数sum

>>> list_1 = [['the guy was unable to play football, but he was able to play tennis'],['That was absolute cool'],['This is implicit living.']]
>>> list_2 =['unable', 'unquestioning', 'implicit','living', 'relative', 'comparative']   
>>> [sum(1 for word in list_2 if word in sentence) for sublist in list_1 for sentence in sublist]

[1, 0, 2]

相关问题更多 >

编程相关推荐

热门问题

热门文章