"我可以在pandas中像这样进行字符串操作吗?"

2024-05-15 12:10:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我用pandas得到的数据看起来像下面代码中的dict。你知道吗

我想找到所有的salsa类型,把它们放在一个dict中,用salsa类型作为字典值的条目数。你知道吗

这里是Python。有没有办法在熊猫身上做这样的事?或者我应该在这个任务中使用简单的olepython?你知道吗

#!/usr/bin/env python3
import pandas as pd

items_df = pd.DataFrame({'choice_description': {0: '[Tomatillo Red Chili Salsa, [Fajita Vegetables, Black Beans, Pinto Beans, Cheese, Sour Cream, Guacamole, Lettuce]]', 1: '[Tomatillo-Red Chili Salsa (Hot), [Black Beans, Rice, Cheese, Sour Cream]]', 2: '[Fresh Tomato Salsa (Mild), [Rice, Cheese, Sour Cream, Guacamole, Lettuce]]', 3: '[Tomatillo Red Chili Salsa, [Fajita Vegetables, Black Beans, Pinto Beans, Cheese, Sour Cream, Guacamole, Lettuce]]'}, 'item_name': {0: 'Chips and Fresh Tomato Salsa', 1: 'Chips and Tomatillo-Green Chili Salsa', 2: 'Chicken Bowl', 3: 'Steak Burrito'}})

salsa_types_d = {}

for row in items_df.itertuples():
    for food in row[1:]:
        fixed_foods_l = food.replace("and",',').replace('[','').replace(']','').split(',')
        fixed_foods_l = [f.strip() for f in fixed_foods_l if f.find("alsa") > -1]
        for fixed_food in fixed_foods_l:
            salsa_types_d[fixed_food] = salsa_types_d.get(fixed_food, 0) + 1

print('\n'.join("%-33s:%d" % (k,salsa_types_d[k]) for k in sorted(salsa_types_d,key=salsa_types_d.get,reverse=True)))

"""
Output:

Tomatillo Red Chili Salsa        :2
Fresh Tomato Salsa               :1
Fresh Tomato Salsa (Mild)        :1
Tomatillo-Green Chili Salsa      :1
Tomatillo-Red Chili Salsa (Hot)  :1

---
Thank you for any insight.

Marilyn
"""

Tags: inforfoodredtypesfixedcreamfresh
1条回答
网友
1楼 · 发布于 2024-05-15 12:10:40

这可以不使用for循环来完成,其中一种方法是通过stacking列创建一个分离的df,然后在replacing the values之后创建一个不包含alsadropping the values。最后用value_counts得到频率。你知道吗

new_df = items_df.stack().reset_index(drop=True)
         .replace(['and', '\[', '\]'],[',', '',''], regex=True).str.split(',')
         .apply(lambda x: pd.Series([i.lstrip() for i in x if 'alsa' in i]))[0].value_counts()

输出:

Tomatillo Red Chili Salsa          2
Tomatillo-Green Chili Salsa        1
Tomatillo-Red Chili Salsa (Hot)    1
Fresh Tomato Salsa (Mild)          1
Fresh Tomato Salsa                 1
Name: 0, dtype: int64

相关问题 更多 >