Python计算两列之间匹配单词数量

0 投票

2 回答

49 浏览

提问于 2025-04-14 18:24

我想计算一个单词列表在某一列中出现的次数。这里是我的数据框：

original                           people       result
John is a good friend              John, Mary   1
Mary and Peter are going to marry  Peter, Mary  2
Bond just met the Bond girl        Bond         2
Chris is having dinner             NaN          0
All Marys are here                 Mary         0

我试着使用这里建议的代码检查数据框中的一列是否包含另一列的单词：

import pandas as pd
import re
df['result'] = [', '.join([p for p in po 
                     if re.search(f'\\b{p}\\b', o)) ]
                for o, po in zip(df.original, df.people.str.split(',\o*'))
             ]
# And after I would try to calculate the number of words in column 'result'

但是我收到了以下信息：

error: bad escape \o at position 1

有没有人能给点建议？

数据处理字符串匹配数据分析 pandas 数据框匹配单词列出现次数

2 个回答

在两个列上使用 split 方法，然后检查“Original”中的每个单词是否出现在“people”中：

df["people"] = df["people"].fillna("")
df["result"] = [sum(w in ws for w in s.split()) for s, ws in zip(df["original"], df["people"].str.split(', '))]

>>> df

                            original       people  result
0              John is a good friend   John, Mary       1
1  Mary and Peter are going to marry  Peter, Mary       2
2        Bond just met the Bond girl         Bond       2
3             Chris is having dinner                    0
4                 All Marys are here         Mary       0

回答于 2025-04-14 由 Python大师

分享举报

在编程中，有时候我们需要把一些数据从一个地方传到另一个地方。这个过程叫做“传递数据”。比如说，你在一个程序里输入了你的名字，然后这个名字需要被传到另一个地方去使用，这就是数据传递。

有几种方法可以实现数据传递。最常见的方式是使用“变量”。变量就像一个盒子，你可以把东西放进去，然后在需要的时候再拿出来。比如，你可以创建一个名为“用户名字”的变量，把你的名字放进去，这样在程序的其他地方就可以使用这个名字了。

除了变量，还有其他一些方法，比如“函数”。函数可以看作是一个小工具，它可以接收输入（比如你的名字），然后做一些事情（比如打印出来），最后可能还会给你一个结果。

总之，数据传递是编程中非常重要的一部分，它帮助我们在不同的地方使用相同的数据，让程序更灵活和强大。

In [39]: df = pd.DataFrame({'original':["John is a good friend", "Mary and Peter are going to marry", "Bond just met the Bond girl", "Chris is having dinner", "All Marys are here"], "people": ["John, Mary", "Peter, Mary", "Bond", '', "Mary"]})

In [40]: df
Out[40]:
                            original       people
0              John is a good friend   John, Mary
1  Mary and Peter are going to marry  Peter, Mary
2        Bond just met the Bond girl         Bond
3             Chris is having dinner
4                 All Marys are here         Mary

In [41]: df['result'] = df.apply(lambda row: sum((row['original'].count(p.strip()) for p in row['people'].split(',') if p), start=0), axis=1)

In [42]: df
Out[42]:
                            original       people  result
0              John is a good friend   John, Mary       1
1  Mary and Peter are going to marry  Peter, Mary       2
2        Bond just met the Bond girl         Bond       2
3             Chris is having dinner                    0
4                 All Marys are here         Mary       1

回答于 2025-04-14 由 Python大师

分享举报

Python计算两列之间匹配单词数量

2 个回答

撰写回答