基于特定列条件从数据帧获取所有行组合？

Id Calories Protein IsBreakfast IsLunch IsDinner 1 300 6 0 1 0 2 400 12 1 1 0 . . . 100 700 25 0 1 1

2条回答

网友

1楼 · 编辑于 2024-04-25 21:38:45

您可以使用|和&运算符将筛选器的组合添加到数据帧。创建虚拟数据帧，例如：

df1 = pd.DataFrame({"Calories": [100, 200, 300, 400, 500],
                    "Protein": [10, 20, 30, 40, 50],
                    "IsBreakfast": [1, 1, 0, 0, 0],
                    "IsLunch": [1, 0, 0, 0, 1],
                    "IsDinner": [1, 1, 1, 0, 1]})
print(df1)

输出：

   Calories  Protein  IsBreakfast  IsLunch  IsDinner
0       100       10            1        1         1
1       200       20            1        0         1
2       300       30            0        0         1
3       400       40            0        0         0
4       500       50            0        1         1

现在添加所有条件：

min_cal = 100
max_cal = 600
min_prot = 10
max_prot = 40
df_filtered = df1[
    ((df1['IsBreakfast']==1) | (df1['IsLunch']==1) | (df1['IsDinner']==1)) &
    ((df1['Calories'] > min_cal) & (df1['Calories'] < max_cal)) &
    ((df1['Protein'] > min_prot) & (df1['Protein'] < max_prot))
]

print(df_filtered)

输出：

   Calories  Protein  IsBreakfast  IsLunch  IsDinner
1       200       20            1        0         1
2       300       30            0        0         1

网友

2楼 · 编辑于 2024-04-25 21:38:45

您可以使用this answer中描述的方法生成一个新的数据帧，其中包含来自原始数据的三行的所有组合：

from itertools import combinations
import pandas as pd

# Using skbrhmn's df
df = pd.DataFrame({"Calories": [100, 200, 300, 400, 500],
                   "Protein": [10, 20, 30, 40, 50],
                   "IsBreakfast": [1, 1, 0, 0, 0],
                   "IsLunch": [1, 0, 0, 0, 1],
                   "IsDinner": [1, 1, 1, 0, 1]})

comb_rows = list(combinations(df.index, 3))
comb_rows

输出：

[(0, 1, 2),
 (0, 1, 3),
 (0, 1, 4),
 (0, 2, 3),
 (0, 2, 4),
 (0, 3, 4),
 (1, 2, 3),
 (1, 2, 4),
 (1, 3, 4),
 (2, 3, 4)]

然后创建一个新的DataFrame，包含原始帧中所有数值字段的总和，覆盖三行的所有可能组合：

combinations = pd.DataFrame([df.loc[c,:].sum() for c in comb_rows], index=comb_rows)

print(combinations)

           Calories  Protein  IsBreakfast  IsLunch  IsDinner
(0, 1, 2)       600       60            2        1         3
(0, 1, 3)       700       70            2        1         2
(0, 1, 4)       800       80            2        2         3
(0, 2, 3)       800       80            1        1         2
(0, 2, 4)       900       90            1        2         3
(0, 3, 4)      1000      100            1        2         2
(1, 2, 3)       900       90            1        0         2
(1, 2, 4)      1000      100            1        1         3
(1, 3, 4)      1100      110            1        1         2
(2, 3, 4)      1200      120            0        1         2

最后，您可以应用所需的任何筛选器：

filtered = combinations[
    (combinations.IsBreakfast>0) &
    (combinations.IsLunch>0) &
    (combinations.IsDinner>0) &
    (combinations.Calories>600) &
    (combinations.Calories<1000) &
    (combinations.Protein>=80) &
    (combinations.Protein<120)
]
print(filtered)

           Calories  Protein  IsBreakfast  IsLunch  IsDinner
(0, 1, 4)       800       80            2        2         3
(0, 2, 3)       800       80            1        1         2
(0, 2, 4)       900       90            1        2         3

相关问题更多 >

编程相关推荐

热门问题

热门文章

基于特定列条件从数据帧获取所有行组合？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >