想知道两个不同子集的重叠中有多少个对象吗

2024-05-23 17:41:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个具有某些特征的类别(身高和体重,由np.哪里)以及具有其他特征的不同类别(如果某人是双胞胎或非双胞胎&有多少兄弟姐妹,由np.哪里). 我想看看有多少人同时属于这两个类别(比如如果做了维恩图,有多少人会在中间?)。你知道吗

我正在导入CSV文件的列。 这就是桌子的样子:

    Child  Inches  Weight Twin  Siblings
0     A      53     100    Y         3
1     B      54     110    N         4
2     C      56     120    Y         2
3     D      58     165    Y         1
4     E      60     150    N         1
5     F      62     160    N         1
6     H      65     165    N         3
import pandas as pd
import numpy as np

file = pd.read_csv(r'~/Downloads/Test3 CVS_Sheet1.csv')
#%%
height = file["Inches"]
weight = file["Weight"]
twin = file["Twin"]
siblings = file["Siblings"]
#%%
area1 = np.where((height <= 60) & (weight <= 150))[0]
#%%
#has two or more siblings (and is a twin)
group_a = np.where((siblings >= 2) & (twin == 'Y'))[0]

#has two or more siblings (and is not a twin)
group_b = np.where((siblings >= 2) & (twin == 'N'))[0]

#has only one sibling (and is twin)
group_c = np.where((siblings == 1) & (twin == 'Y'))[0]

#has only one sibling (and is not a twin)
group_d = np.where((siblings == 1) & (twin == 'N'))[0]
#%%
for i in area1:
    if group_a==True:
        print("in area1 there are", len(i), "children in group_a")
    elif group_b==True:
        print("in area1 there are", len(i), "children in group_b")  
    elif group_c==True:
        print("in area1 there are", len(i), "children in group_c")
    elif group_d==True:
        print("in area1 there are", len(i), "children in group_d")

我得到一个错误:“ValueError:一个数组中有多个元素的真值是不明确的。使用a.any()或a.all()“

我希望有这样的结果:

"in area1 there are 2 children in group_a"
"in area1 there are 1 children in group_b"
"in area1 there are 0 children in group_c"
"in area1 there are 1 children in group_d"

提前谢谢!你知道吗


Tags: andintrueisnpgroupwheretwin
2条回答

我不知道你是想做什么,我和循环,但这应该工作

import os
import pandas as pd
file_data = pd.read_csv(r'~/Downloads/Test3 CVS_Sheet1.csv')
area1 = file_data[file_data['Inches'] <= 60]
area1 = area1[area1['Weight'] <= 150]

group_a = area1[area1['Siblings'] >= 2]
group_a = group_a[group_a['Twin'] == 'Y']

group_b = area1[area1['Siblings'] >= 2]
group_b = group_b[group_b['Twin'] == 'N']

group_c = area1[area1['Siblings'] == 1]
group_c = group_c[group_c['Twin'] == 'Y']

group_d = area1[area1['Siblings'] == 1]
group_d = group_d[group_d['Twin'] == 'N']


print("in area1 there are", len(group_a.index), "children in group_a")
print("in area1 there are", len(group_b.index), "children in group_b")
print("in area1 there are", len(group_c.index), "children in group_c")
print("in area1 there are", len(group_d.index), "children in group_d")

在你的例子中,我会采用稍微不同的设计。您可以这样做:

df['area1'] = np.where((df.Inches <= 60) & (df.Weight <= 150),1,0)
df['group_a'] = np.where((df.Siblings >= 2) & (df.Twin == 'Y'),1,0)
df['group_b'] = np.where((df.Siblings >= 2) & (df.Twin == 'N'),1,0)
df['group_c'] = np.where((df.Siblings == 1) & (df.Twin == 'Y'),1,0)
df['group_d'] = np.where((df.Siblings == 1) & (df.Twin == 'N'),1,0)

结果是这样的:

enter image description here

从这一点开始,您可以构建查询,以便查看组\u b:

df.groupby(['area1'])['group_b'].sum()[1]

你会得到你想要的结果:1。你可以玩总和或计数来调整你的桌子。你知道吗

最后:

for col in df.columns[6:]:
   r = df.groupby(['area1'])[col].sum()[1]
   print ("in area1 there are",r,'children in',col)

将产生:

in area1 there are 2 children in group_a
in area1 there are 1 children in group_b
in area1 there are 0 children in group_c
in area1 there are 1 children in group_d

相关问题 更多 >