Python计算两列之间匹配单词数量

0 投票
2 回答
49 浏览
提问于 2025-04-14 18:24

我想计算一个单词列表在某一列中出现的次数。这里是我的数据框:

original                           people       result
John is a good friend              John, Mary   1
Mary and Peter are going to marry  Peter, Mary  2
Bond just met the Bond girl        Bond         2
Chris is having dinner             NaN          0
All Marys are here                 Mary         0

我试着使用这里建议的代码 检查数据框中的一列是否包含另一列的单词

import pandas as pd
import re
df['result'] = [', '.join([p for p in po 
                     if re.search(f'\\b{p}\\b', o)) ]
                for o, po in zip(df.original, df.people.str.split(',\o*'))
             ]
# And after I would try to calculate the number of words in column 'result'

但是我收到了以下信息:

error: bad escape \o at position 1

有没有人能给点建议?

2 个回答

3

在两个列上使用 split 方法,然后检查“Original”中的每个单词是否出现在“people”中:

df["people"] = df["people"].fillna("")
df["result"] = [sum(w in ws for w in s.split()) for s, ws in zip(df["original"], df["people"].str.split(', '))]

>>> df

                            original       people  result
0              John is a good friend   John, Mary       1
1  Mary and Peter are going to marry  Peter, Mary       2
2        Bond just met the Bond girl         Bond       2
3             Chris is having dinner                    0
4                 All Marys are here         Mary       0

2

在编程中,有时候我们需要把一些数据从一个地方传到另一个地方。这个过程叫做“传递数据”。比如说,你在一个程序里输入了你的名字,然后这个名字需要被传到另一个地方去使用,这就是数据传递。

有几种方法可以实现数据传递。最常见的方式是使用“变量”。变量就像一个盒子,你可以把东西放进去,然后在需要的时候再拿出来。比如,你可以创建一个名为“用户名字”的变量,把你的名字放进去,这样在程序的其他地方就可以使用这个名字了。

除了变量,还有其他一些方法,比如“函数”。函数可以看作是一个小工具,它可以接收输入(比如你的名字),然后做一些事情(比如打印出来),最后可能还会给你一个结果。

总之,数据传递是编程中非常重要的一部分,它帮助我们在不同的地方使用相同的数据,让程序更灵活和强大。

In [39]: df = pd.DataFrame({'original':["John is a good friend", "Mary and Peter are going to marry", "Bond just met the Bond girl", "Chris is having dinner", "All Marys are here"], "people": ["John, Mary", "Peter, Mary", "Bond", '', "Mary"]})

In [40]: df
Out[40]:
                            original       people
0              John is a good friend   John, Mary
1  Mary and Peter are going to marry  Peter, Mary
2        Bond just met the Bond girl         Bond
3             Chris is having dinner
4                 All Marys are here         Mary

In [41]: df['result'] = df.apply(lambda row: sum((row['original'].count(p.strip()) for p in row['people'].split(',') if p), start=0), axis=1)

In [42]: df
Out[42]:
                            original       people  result
0              John is a good friend   John, Mary       1
1  Mary and Peter are going to marry  Peter, Mary       2
2        Bond just met the Bond girl         Bond       2
3             Chris is having dinner                    0
4                 All Marys are here         Mary       1

撰写回答