为所有组合创建基于索引和分组的内部python循环

2024-04-27 00:47:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个脚本,它查看属于组(REG\u ID)的行和列标题,并对值求和。代码在矩阵(小子集)上运行,如下所示:

Outputs

我的代码运行良好,可以根据属于每个内部组(REG\u ID)的行和列计算所有ID的总和。例如,对于属于REG\u ID 1的所有行和列ID,计算区域1和区域1(内部流)之间的总流,依此类推。 我希望通过计算(求和)区域之间的流来扩展此代码,例如区域1到区域2、3、4、5。。。。 我想我需要在现有的while循环中包含另一个循环,但是如果能帮助我找出它应该在哪里以及如何构造它,我会非常感激。 我的代码当前运行在内部流和(1-1、2-2、3-3等)上,如下所示:

global index
index = 1
x = index
while index < len(idgroups):
    ward_list = idgroups[index] #select list of ward ids for each region from list of lists
    df6 = mergedcsv.loc[ward_list] #select rows with values in the list
    dfcols = mergedcsv.loc[ward_list, :] #select columns with values in list
    ward_liststr = map(str, ward_list) #convert ward_list to strings so that they can be used to select columns, won't work as integers.
    ward_listint = map(int, ward_list)
    #dfrowscols = mergedcsv.loc[ward_list, ward_listint]
    df7 = df6.loc[:, ward_liststr]
    print df7
    regflowsum = df7.values.sum() #sum all values in dataframe
    intflow = [regflowsum]
    print intflow
    dfintflow = pd.DataFrame(intflow)
    dfintflow.reset_index(level=0, inplace=True)
    dfintflow.columns = ["RegID", "regflowsum"]
    dfflows.set_value(index, 'RegID', index)
    dfflows.set_value(index, 'RegID2', index)
    dfflows.set_value(index, 'regflow', regflowsum)
    mergedcsv.set_value(ward_list, 'TotRegFlows', regflowsum)
    index += 1 #increment index number
print dfflows
new_df = pd.merge(pairlist, dfflows,  how='left', left_on=['origID','destID'], right_on = ['RegID', 'RegID2'])
print new_df #useful for checking dataframe merges
regionflows = r"C:\Temp\AllNI\regionflows.csv"
header = ["WardID","LABEL","REG_ID","Total","TotRegFlows"]
mergedcsv.to_csv(regionflows, columns = header, index=False)
regregflows = r"C:\Temp\AllNI\reg_regflows.csv"
headerreg = ["REG_ID_ORIG", "REG_ID_DEST", "FLOW"]

pairlistCSV = r"C:\Temp\AllNI\pairlist_regions.csv"
new_df.to_csv(pairlistCSV)

输出如下:

idgroups数据帧:(参见图1-图1的第二部分)

df7和intflows for each Reg\u ID:(图1的第三部分-右侧)

ddflows数据帧:(图2的第四部分)

最后的输出是新的df:(图2的第五部分)

我希望填充区域间所有可能的流动组合的总和,而不仅仅是内部流动。你知道吗

我想我需要在while循环中添加另一个循环。因此,可以添加如下枚举函数:

while index < len(idgroups):
    #add line(s) to calculate flows between regions
    for index, item in enumerate(idgroups):
        ward_list = idgroups[index]
        print ward_list
        df6 = mergedcsv.loc[ward_list] #select rows with values in the list
        dfcols = mergedcsv.loc[ward_list, :] #select columns with values in list
        ward_liststr = map(str, ward_list) #convert ward_list to strings so that they can be used to select columns, won't work as integers.
        ward_listint = map(int, ward_list)
        #dfrowscols = mergedcsv.loc[ward_list, ward_listint]
        df7 = df6.loc[:, ward_liststr]
        print df7
        regflowsum = df7.values.sum() #sum all values in dataframe
        intflow = [regflowsum]
        print intflow
        dfintflow = pd.DataFrame(intflow)
        dfintflow.reset_index(level=0, inplace=True)
        dfintflow.columns = ["RegID", "regflowsum"]
        dfflows.set_value(index, 'RegID', index)
        dfflows.set_value(index, 'RegID2', index)
        dfflows.set_value(index, 'regflow', regflowsum)
        mergedcsv.set_value(ward_list, 'TotRegFlows', regflowsum)
        index += 1 #increment index number

我不确定如何集成该项,所以很难扩展所有组合的代码。谢谢你的建议。你知道吗

基于流函数更新:

    w=pysal.rook_from_shapefile("C:/Temp/AllNI/NIW01_sort.shp",idVariable='LABEL')
Simil = pysal.open("C:/Temp/AllNI/simNI.csv")
Similarity = np.array(Simil)
db = pysal.open('C:\Temp\SQLite\MatrixCSV2.csv', 'r')
dbf = pysal.open(r'C:\Temp\AllNI\NIW01_sortC.dbf', 'r')
ids = np.array((dbf.by_col['LABEL']))
commuters = np.array((dbf.by_col['Total'],dbf.by_col['IDNO']))
commutersint = commuters.astype(int)
comm = commutersint[0]
floor = int(MIN_COM_CT + 100)
solution = pysal.region.Maxp(w=w,z=Similarity,floor=floor,floor_variable=comm)
regions = solution.regions
#print regions
writecsv = r"C:\Temp\AllNI\reg_output.csv"
csv = open(writecsv,'w')
csv.write('"LABEL","REG_ID"\n')
for i in range(len(regions)):
        for lines in regions[i]:
            csv.write('"' + lines + '","' + str(i+1) + '"\n')
csv.close()
flows = r"C:\Temp\SQLite\MatrixCSV2.csv"
regs = r"C:\Temp\AllNI\reg_output.csv"
wardflows = pd.read_csv(flows)
regoutput = pd.read_csv(regs)
merged = pd.merge(wardflows, regoutput)
#duplicate REG_ID column as the index to be used later
merged['REG_ID2'] = merged['REG_ID']
merged.to_csv("C:\Temp\AllNI\merged.csv", index=False)
mergedcsv = pd.read_csv("C:\Temp\AllNI\merged.csv",index_col='WardID_1') #index this dataframe using the WardID_1 column
flabelList = pd.read_csv("C:\Temp\AllNI\merged.csv", usecols = ["WardID", "REG_ID"]) #create list of all FLabel values

reg_id = "REG_ID"
ward_flows = "RegIntFlows"
flds = [reg_id, ward_flows] #create list of fields to be use in search

dict_ref = {} # create a dictionary with for each REG_ID a list of corresponding FLABEL fields


#group the dataframe by the REG_ID column
idgroups = flabelList.groupby('REG_ID')['WardID'].apply(lambda x: x.tolist())
print idgroups

idgrp_df = pd.DataFrame(idgroups)

csvcols = mergedcsv.columns

#create a list of column names to pass as an index to select columns
columnlist = list(mergedcsv.columns.values)

mergedcsvgroup = mergedcsv.groupby('REG_ID').sum()
mergedcsvgroup.describe()
idList = idgroups[2]
df4 = pd.DataFrame()
df5 = pd.DataFrame()
col_ids = idList #ward id no

regiddf = idgroups.index.get_values()
print regiddf
#total number of region ids
#print regiddf
#create pairlist combinations from region ids
#combinations with replacement allows for repeated items
#pairs = list(itertools.combinations_with_replacement(regiddf, 2))
pairs = list(itertools.product(regiddf, repeat=2))
#print len(pairs)

#create a new dataframe with pairlists and summed data
pairlist = pd.DataFrame(pairs,columns=['origID','destID'])
print pairlist.tail()
header_pairlist = ["origID","destID","flow"]
header_intflow = ["RegID", "RegID2", "regflow"]
dfflows = pd.DataFrame(columns=header_intflow)

print mergedcsv.index
print mergedcsv.dtypes
#mergedcsv = mergedcsv.select_dtypes(include=['int64'])
#print mergedcsv.columns
#mergedcsv.rename(columns = lambda x: int(x), inplace=True)

def flows():
    pass

#def flows(mergedcsv, region_a, region_b):
def flows(mergedcsv, ward_lista, ward_listb):
    """Return the sum of all the cells in the row/column intersections
    of ward_lista and ward_listb."""

    mergedcsv = mergedcsv.loc[:, mergedcsv.dtypes == 'int64']
    regionflows = mergedcsv.loc[ward_lista, ward_listb]
    regionflowsum = regionflows.values.sum()


    #grid = [ax, bx, regflowsuma, regflowsumb]
    gridoutput = [ax, bx, regionflowsum]
    print gridoutput

    return regflowsuma
    return regflowsumb

#print mergedcsv.index

#mergedcsv.columns = mergedcsv.columns.str.strip()

for ax, group_a in enumerate(idgroups):
    ward_lista = map(int, group_a)
    print ward_lista


    for bx, group_b in enumerate(idgroups[ax:], start=ax):
        ward_listb = map(int, group_b)
        #print ward_listb

        flow_ab = flows(mergedcsv, ward_lista, ward_listb)
            #flow_ab = flows(mergedcsv, group_a, group_b)

这将导致KeyError:“[[189、197、198、201]]中没有一个在[列]中”

我也尝试过使用ward\u lista=map(str,group\u a)和map(int,group\u a),但是列出了在中找不到的对象数据帧.loc. 这些列是混合数据类型,但所有包含应切片的标签的列都是int64类型。 我试过很多关于数据类型的解决方案,但都没有用。有什么建议吗?你知道吗


Tags: columnscsvtoinidindexregtemp
1条回答
网友
1楼 · 发布于 2024-04-27 00:47:03

我说不出你在做什么计算,但你似乎只是想把组组合起来。问题是它们是有向的还是无向的——也就是说,您需要计算流(A,B)和流(B,A),还是只计算一个?你知道吗

如果只有一个,你可以这样做:

for i,ward_list in enumerate(idgroups):
    for j,ward_list2 in enumerate(idgroups[i:],start=i):

这将迭代i,j对,如:

0,0 0,1 0,2 ... 0,n
1,1 1,2 ... 1,n
2,2 ... 2,n

在无方向的情况下也适用。你知道吗

如果您需要同时计算流(A,B)和流(B,A),那么只需将代码推入一个名为flows的函数中,并使用反向参数调用它,如图所示。;—)

更新

让我们定义一个名为flows的函数:

def flows():
    pass

现在,参数是什么?你知道吗

好吧,看看你的代码,它从数据帧中获取数据。你想要两个不同的病房,让我们从这些开始。结果似乎是结果网格的总和。你知道吗

def flows(df, ward_a, ward_b):
    """Return the sum of all the cells in the row/column intersections
    of ward_a and ward_b."""

    return 0

现在我要复制你的代码行:

    ward_list = idgroups[index]
    print ward_list
    df6 = mergedcsv.loc[ward_list] #select rows with values in the list
    dfcols = mergedcsv.loc[ward_list, :] #select columns with values in list
    ward_liststr = map(str, ward_list) #convert ward_list to strings so that they can be used to select columns, won't work as integers.
    ward_listint = map(int, ward_list)
    #dfrowscols = mergedcsv.loc[ward_list, ward_listint]
    df7 = df6.loc[:, ward_liststr]
    print df7
    regflowsum = df7.values.sum() #sum all values in dataframe
    intflow = [regflowsum]
    print intflow

我认为这是这里的大部分flow函数。让我们看看。你知道吗

  1. ward_list显然是ward_award_b参数。

  2. 我不确定df6是什么,因为您在df7中重新计算了它。所以这需要澄清。

  3. regflowsum是我们想要的输出,我想。

将其重写到函数中:

def flows(df, ward_a, ward_b):
    """Return the sum of all the cells in the row/column intersections
    of ward_a and ward_b."""

    print "Computing flows from:"
    print "    ", ward_a
    print ""
    print "flows into:"
    print "    ", ward_b

    # Filter rows by ward_a, cols by ward_b:
    grid = df.loc[ward_a, ward_b]

    print "Grid:"
    print grid

    flowsum = grid.values.sum()

    print "Flows:", flowsum

    return flowsum

现在,我假设ward_award_b值的格式已经正确了。因此,我们必须str-将它们或函数之外的任何东西化。让我们这样做:

for ax, group_a in enumerate(idgroups):
    ward_a = map(str, group_a)

    for bx, group_b in enumerate(idgroups[ax:], start=ax):
        ward_b = map(str, group_b)

        flow_ab = flows(mergedcsv, ward_a, ward_b)

        if ax != bx:
            flow_ba = flows(mergedcsv, ward_b, ward_a)
        else:
            flow_ba = flow_ab

        # Now what?

在这一点上,你有两个数字。当病房相同(内部流动?)时,它们将相等。此时,您的原始代码不再有用,因为它只处理内部流,而不处理A->;B流,所以我不知道该怎么办。但是这些值都在变量中,所以。。。你知道吗

相关问题 更多 >