正在计算与另一列匹配的字符串的出现次数

df = {'msg':['i am so happy thank you', 'sticker omitted', 'sticker omitted', 'thank you for your time!' ,'sticker omitted','hello there'], 'number_of_stickers':['2','0','0','1','0','0']} ##This column 'number_of_stickers' is what i am aiming to achieve. Currently, i don't have this column. df = pd.DataFrame(data=df)

3条回答

网友
1楼 · 编辑于 2024-05-20 11:11:54

检查其他值并使用cumsum来识别块是一种常见的技术：
omitted = df.msg.ne('sticker omitted').cumsum() df['number_of_stickers'] = np.where(omitted.duplicated(), 0, omitted.groupby(omitted).transform('size')-1)

网友
2楼 · 编辑于 2024-05-20 11:11:54

到目前为止，您已经完全掌握了它，并且您的数据对于一个简单但功能强大的算法来说是非常重要的
下面是我为这个问题编写的一段代码：
#ss df = {'msg':['i am so happy thank you', 'sticker omitted', 'sticker omitted', 'thank you for your time!' ,'sticker omitted'], 'number_of_stickers':['2','0','0','1','0']} j = 0 newarr = [] # new array for use for i in df["number_of_stickers"]: if(not int(i)==0): newarr.append([df["msg"][j], int(i)]) # will store each data in a array #access the number of it by using element 1(newarr[1]) and the msg by newarr[0] j+=1; #se #feel free to do whatever you want after ss to se pd.DataFrame(data=df)
se是代码段结束，ss是代码段开始
希望这有帮助！如果没有，请在下面发表评论
此外，还必须将新数组重新馈送到dict

网友
3楼 · 编辑于 2024-05-20 11:11:54

这段代码应该可以完成这项工作。我找不到一个只使用pandas函数的解决方案（这可能是可行的）。无论如何，我在代码中留下了一些注释来描述我的方法

# create data
df_dict = {'msg':['i am so happy thank you',
'sticker omitted',
'sticker omitted',
'thank you for your time!'
,'sticker omitted']}

df=pd.DataFrame(data=df_dict)

# build column for sticker counts after message 
sticker_counts = []
for index, row in df.iterrows(): # iterating over df rows
    flag = True
    count = 0
    # when a sticker row is encountered, just put 0 in the count column
    # when a non-sticker row is encountered do the following
    if row['msg'] != 'sticker omitted': 
        k = 1 # to check rows after the non-sticker row
        while flag:
            # if the index + k row is a sticker increase the count for index and k
            if df.loc[index + k].msg == 'sticker omitted': 
                count += 1
                k += 1
                # when reached the end of the database, break the loop
                if index + k +1 > len(df):
                    flag = False
            else:
                flag = False
                k = 1
    sticker_counts.append(count)
df['sticker_counts'] = sticker_counts
print(df)

更新：

相关问题更多 >

编程相关推荐

热门问题

热门文章