我正在尝试在34个不同的组中自动执行for loop
计算。我有一个数据集,它包含了34个省400个地区的X和Y点。对于每个省,我要计算从该省的区首府到该省每个区的距离
然后,我想重复计算下一个省的省会及其地区
我已经尝试过的是非常初级的,并没有接近自动化的预期结果
import pandas as pd
import mpu
### my basic coding ability would lead me to do something like this 34 times,
### manually hunting for the index with the capital and concating results
df = pd.read_csv('https://raw.githubusercontent.com/rocketfish88/democ/master/af-districts.csv')
new_df = df[0:27]
distance = []
for i in range(new_df.shape[0]):
distance.append(mpu.haversine_distance((new_df['Y'][7], new_df['X'][7]), (new_df['Y'][i], new_df['X'][i])))
import pandas as pd
import numpy as np
import mpu
df = pd.read_csv('https://raw.githubusercontent.com/rocketfish88/democ/master/af1.csv')
j = []
for i in range(399):
j = df[df['Capital'] == 1][['Y', 'X', 'Province', 'District', 'Capital']]
j.rename(columns={'Y': 'CapY', 'X': 'CapX'}, inplace=True)
df1 = df.merge(j, how = 'left', on = ['Province']) # this is it!
container = []
for i in range(399):
container.append(mpu.haversine_distance((df1['Y'][i], df1['X'][i]),
(df1['CapY'][i], df1['CapX'][i]))) # working?
container = pd.Series(container)
df1 = pd.concat((df1, container.rename('distance')), axis = 1)
我需要一些帮助来找出这个循环的原因:
container = []
for i in range(399):
container.append(mpu.haversine_distance((df1['Y'][i], df1['X'][i]),
(df1['CapY'][i], df1['CapX'][i])))
这个循环不起作用:
for i in range(399):
df1['distance2'] = ''
df1['distance2'][i] = mpu.haversine_distance((df1['Y'][i], df1['X'][i]),
(df1['CapY'][i], df1['CapX'][i]))
如果看不到数据帧的结构,就很难给出细节。但是,您所描述的是嵌套循环操作。在伪代码中,您可以:
根据您描述的大小数据,这应该非常快
我认为在数据框架中不尝试这样做会更容易。要理解正在发生的事情要容易得多
编辑:要获得各省和省会的配对,您可以这样做:
这将使一个子集的数据帧中只有2列,我想这是你想要的。然后可以将其转换为元组列表,以简化迭代:
现在你有了一些很容易迭代的东西
相关问题 更多 >
编程相关推荐