按国家分组的计数操作,返回python中的数据帧

2024-06-16 14:04:35 发布

您现在位置:Python中文网/ 问答频道 /正文

数据:

^{tb1}$

期望输出:

{ "India" :{"A":1,"C":2},"Malaysia":{"B":1,"A":1,"D":1},"Croatia":{"C":1}}

我试过:


arrays = [countrylist, opslist]

index = pd.MultiIndex.from_arrays(arrays, names=('Country', 'Ops'))

df=pd.DataFrame(index)

count = list(df[0].value_counts())

clist = list(df[0].unique())

csdict = dict()

for country,service in clist: 

csdict.setdefault(country, []).append(service) 

country_list = list(csdict.keys())

service_list = list(csdict.values())

fdict = { "country" : country_list, "services" : service_list}

dataf = pd.DataFrame(fdict)


Tags: 数据dataframedfindexservicecountrylistpd
2条回答

下面是如何使用内置的^{}方法:

z = list(zip(df.country, df.operations))

output = dict()
for c, o in z:
    output[c] = output.get(c) or dict()
    output[c][o] = z.count((c, o))
print(output)

输出:

{'India': {'A': 1, 'C': 2}, 'Malaysia': {'B': 1, 'D': 1, 'A': 1}, 'Croatia': {'C': 1}}

对每个组使用^{}字典理解:

d = {k: v.value_counts(sort=False).to_dict() 
         for k, v in df.groupby('country', sort=False)['operations']}

print (d)
{'India': {'A': 1, 'C': 2}, 'Malaysia': {'B': 1, 'A': 1, 'D': 1}, 'Croatia': {'C': 1}}

相关问题 更多 >