如何在多个组上循环,同时每个组保持一个值不变?

2024-06-16 14:37:58 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试在34个不同的组中自动执行for loop计算。我有一个数据集,它包含了34个省400个地区的X和Y点。对于每个省,我要计算从该省的区首府到该省每个区的距离

然后,我想重复计算下一个省的省会及其地区

我已经尝试过的是非常初级的,并没有接近自动化的预期结果

import pandas as pd
import mpu
### my basic coding ability would lead me to do something like this 34 times,
### manually hunting for the index with the capital and concating results

df = pd.read_csv('https://raw.githubusercontent.com/rocketfish88/democ/master/af-districts.csv')

new_df = df[0:27]
distance = []
for i in range(new_df.shape[0]):
    distance.append(mpu.haversine_distance((new_df['Y'][7], new_df['X'][7]), (new_df['Y'][i], new_df['X'][i])))

下面是我如何解决这个问题的:

import pandas as pd
import numpy as np
import mpu

df = pd.read_csv('https://raw.githubusercontent.com/rocketfish88/democ/master/af1.csv')

j = []
for i in range(399):
    j = df[df['Capital'] == 1][['Y', 'X', 'Province', 'District', 'Capital']]

j.rename(columns={'Y': 'CapY', 'X': 'CapX'}, inplace=True)

df1 = df.merge(j, how = 'left', on = ['Province']) # this is it!

container = []
for i in range(399):
    container.append(mpu.haversine_distance((df1['Y'][i], df1['X'][i]),
                                            (df1['CapY'][i], df1['CapX'][i]))) # working?

container = pd.Series(container)
df1 = pd.concat((df1, container.rename('distance')), axis = 1) 

如果有人还在看,

我需要一些帮助来找出这个循环的原因:


container = []
for i in range(399):
    container.append(mpu.haversine_distance((df1['Y'][i], df1['X'][i]),
                                            (df1['CapY'][i], df1['CapX'][i])))

这个循环不起作用:

for i in range(399):
    df1['distance2'] = ''
    df1['distance2'][i] = mpu.haversine_distance((df1['Y'][i], df1['X'][i]),
                                            (df1['CapY'][i], df1['CapX'][i])) 

Tags: csvinimportdfnewforcontainerrange
1条回答
网友
1楼 · 发布于 2024-06-16 14:37:58

如果看不到数据帧的结构,就很难给出细节。但是,您所描述的是嵌套循环操作。在伪代码中,您可以:

Loop over all of the provinces:
  identify the capital somehow
  Loop over all of the districts:
    calculate the distance (capital, district)

根据您描述的大小数据,这应该非常快

我认为在数据框架中不尝试这样做会更容易。要理解正在发生的事情要容易得多

编辑:要获得各省和省会的配对,您可以这样做:

df_caps = df[df['ADM2ALT1EN'] == 'Centre'][['ADM1_EN', 'ADM2_EN']]

这将使一个子集的数据帧中只有2列,我想这是你想要的。然后可以将其转换为元组列表,以简化迭代:

cap_pairs = [tuple(x) for x in df.caps.values]

现在你有了一些很容易迭代的东西

for province, cap in cap_pairs:
    # do something 

相关问题 更多 >