获取具有预定义列表的列字符串中匹配单词的计数

def count_words(df): long_list = ['pen', 'pineapple'] count = 0 for c in df['tweet_text']: if c in long_list: count = count + 1 df['count'] = count return df count_word = FunctionTransformer(count_words, validate=False)

def convert_twitter_datetime(df): df['hour'] = pd.to_datetime(df['created_at'], format='%a %b %d %H:%M:%S +0000 %Y').dt.strftime('%H').astype(int) return df convert_datetime = FunctionTransformer(convert_twitter_datetime, validate=False)

3条回答

网友

1楼 · 编辑于 2024-04-19 15:08:22

熊猫有str.count：

# matching any of the words
pattern = r'\b{}\b'.format('|'.join(long_list))

df['count'] = df.text.str.count(pattern)

输出：

   index                                              text  count
0      1              "I have a pen, but I lost it today."      1
1      2  "I have pineapple and pen, but I lost it today."      2

网友

2楼 · 编辑于 2024-04-19 15:08:22

灵感来源于@Quang Hoang的回答

import pandas as pd
import sklearn as sk

y=['pen', 'pineapple']

def count_strings(X, y):
    pattern = r'\b{}\b'.format('|'.join(y))
    return X['text'].str.count(pattern)

string_transformer = sk.preprocessing.FunctionTransformer(count_strings, kw_args={'y': y})
df['count'] = string_transformer.fit_transform(X=df)

导致

    text                                              count
1   "I have a pen, but I lost it today."                1
2   "I have pineapple and pen, but I lost it today.     2

以及以下df2：

#df2
      text
1     "I have a pen, but I lost it today. pen pen"
2     "I have pineapple and pen, but I lost it today."

我们得到

string_transformer.transform(X=df2)
#result
1    3
2    2
Name: text, dtype: int64

这表明，我们将函数转换为sklearn样式的对象。为了进一步说明这一点，我们可以将列名作为关键字参数交给count_strings

网友

3楼 · 编辑于 2024-04-19 15:08:22

用|连接列表中的元素。查找具有.str.findall()的匹配元素并应用.str.len()进行计数

 p='|'.join(long_list)
df=df.assign(count=(df.text.str.findall(p)).str.len())
                                             text   count
0              "I have a pen, but I lost it today."      1
1  "I have pineapple and pen, but I lost it today."      2

相关问题更多 >

编程相关推荐

热门问题

热门文章