想知道两个不同子集的重叠中有多少个对象吗

Child Inches Weight Twin Siblings 0 A 53 100 Y 3 1 B 54 110 N 4 2 C 56 120 Y 2 3 D 58 165 Y 1 4 E 60 150 N 1 5 F 62 160 N 1 6 H 65 165 N 3

import pandas as pd import numpy as np file = pd.read_csv(r'~/Downloads/Test3 CVS_Sheet1.csv') #%% height = file["Inches"] weight = file["Weight"] twin = file["Twin"] siblings = file["Siblings"] #%% area1 = np.where((height <= 60) & (weight <= 150))[0] #%% #has two or more siblings (and is a twin) group_a = np.where((siblings >= 2) & (twin == 'Y'))[0] #has two or more siblings (and is not a twin) group_b = np.where((siblings >= 2) & (twin == 'N'))[0] #has only one sibling (and is twin) group_c = np.where((siblings == 1) & (twin == 'Y'))[0] #has only one sibling (and is not a twin) group_d = np.where((siblings == 1) & (twin == 'N'))[0] #%% for i in area1: if group_a==True: print("in area1 there are", len(i), "children in group_a") elif group_b==True: print("in area1 there are", len(i), "children in group_b") elif group_c==True: print("in area1 there are", len(i), "children in group_c") elif group_d==True: print("in area1 there are", len(i), "children in group_d")

2条回答

网友

1楼 · 编辑于 2024-05-23 17:41:05

我不知道你是想做什么，我和循环，但这应该工作

import os
import pandas as pd
file_data = pd.read_csv(r'~/Downloads/Test3 CVS_Sheet1.csv')
area1 = file_data[file_data['Inches'] <= 60]
area1 = area1[area1['Weight'] <= 150]

group_a = area1[area1['Siblings'] >= 2]
group_a = group_a[group_a['Twin'] == 'Y']

group_b = area1[area1['Siblings'] >= 2]
group_b = group_b[group_b['Twin'] == 'N']

group_c = area1[area1['Siblings'] == 1]
group_c = group_c[group_c['Twin'] == 'Y']

group_d = area1[area1['Siblings'] == 1]
group_d = group_d[group_d['Twin'] == 'N']


print("in area1 there are", len(group_a.index), "children in group_a")
print("in area1 there are", len(group_b.index), "children in group_b")
print("in area1 there are", len(group_c.index), "children in group_c")
print("in area1 there are", len(group_d.index), "children in group_d")

网友

2楼 · 编辑于 2024-05-23 17:41:05

在你的例子中，我会采用稍微不同的设计。您可以这样做：

df['area1'] = np.where((df.Inches <= 60) & (df.Weight <= 150),1,0)
df['group_a'] = np.where((df.Siblings >= 2) & (df.Twin == 'Y'),1,0)
df['group_b'] = np.where((df.Siblings >= 2) & (df.Twin == 'N'),1,0)
df['group_c'] = np.where((df.Siblings == 1) & (df.Twin == 'Y'),1,0)
df['group_d'] = np.where((df.Siblings == 1) & (df.Twin == 'N'),1,0)

结果是这样的：

从这一点开始，您可以构建查询，以便查看组\u b：

df.groupby(['area1'])['group_b'].sum()[1]

你会得到你想要的结果：1。你可以玩总和或计数来调整你的桌子。你知道吗

最后：

for col in df.columns[6:]:
   r = df.groupby(['area1'])[col].sum()[1]
   print ("in area1 there are",r,'children in',col)

将产生：

in area1 there are 2 children in group_a
in area1 there are 1 children in group_b
in area1 there are 0 children in group_c
in area1 there are 1 children in group_d

相关问题更多 >

编程相关推荐

热门问题

热门文章