将数据帧转换为无向边列表

2024-05-16 09:01:16 发布

您现在位置:Python中文网/ 问答频道 /正文

给定一个由边组成的数据帧,我想创建一个聚合数据帧,其中包含两个节点之间的总边的列“frequency”。我还希望边缘列表是无向的,因此如果有A=>;B=1,我还希望有一行,如B=>;A=1

原始数据

import pandas as pd
data = pd.DataFrame({'x': ['jane','jane','jack','bill','jack','terra'],
                     'y': ['jack','jack','jane','terra','terra', 'jack']})

     x      y
0   jane   jack
1   jane   jack
2   jack   jane
3   bill  terra
4   jack  terra
5  terra   jack

预期产量

       x      y  frequency
0   jane   jack          3
1   jack   jane          3
2   bill  terra          1
3   jack  terra          2
4  terra   jack          2

试过这个

## Get size of of one direction for edge list
data=data.groupby(['x','y']).size().reset_index() 

## rename column to 'frequency'
data.rename(columns = {0:'frequency'}, inplace = True) 

## copy dataframe to calculate other direction of edgelist 
data2 = data.copy() 
## reverse the names of columns
data2.rename(columns = {'x':'y', 'y':'x'}, inplace = True) 
## merge
data2 = data.merge(data2, left_on=['x','y'],right_on=['x','y'], suffixes = ['1','2']) 
## add the frequency to get total edge strength
data2['frequency'] = data2['frequency1']+data2['frequency2'] 
data3 = data2[['x','y','frequency']]
 
       x      y  frequency
0   jack   jane          3
1   jack  terra          2
2   jane   jack          3
3  terra   jack          2

最后的结果有点好,我不关心行的顺序。但问题是,我错过了比尔和特拉的一排。由于我合并的方式,它丢失了,因为我最初只有bill=>;没有terra的terra=>;比尔,所以这一行被取消了

我想知道如何识别将被丢弃的行并将它们重新连接,或者是否有更好的方法


Tags: columnsofto数据gtdatasizepd
1条回答
网友
1楼 · 发布于 2024-05-16 09:01:16

不管怎样,我找到了一种方法来达到我想要的结果。该方法使用嵌套的apply()。首先,我使用频率列创建数据帧,用于将要工作的案例(如上所述),它将用作连接到不工作的案例的基本框架

## Same steps as before to calculate frequency column

data=data.groupby(['x','y']).size().reset_index() 
data.rename(columns = {0:'frequency'}, inplace = True) 

##Identify which of those cases will not work using a nested apply function. 
Inner loop returns the opposite direction of the edge and outer loop checks 
the sum of all cases where the original edge has another in the other 
direction.
Code these as 0's and 1's, and filter 0 to identify which edges 
need to be manually created and appended to the final result.

remaining_rows = data.loc[data.apply(lambda x: 1 if sum((x['x'], x['y']) == 
data.apply(lambda x: (x['y'], x['x']), axis = 1))>=1 else 0, axis = 1) ==0]
remaining_rows2.rename(columns = {'x':'y','y':'x'}, inplace = True)
remaining_rows = pd.concat([remaining_rows, remaining_rows2])

remaining_rows

      x      y  frequency
0  bill  terra          1

##Create the edges for the other direction and concat

remaining_rows2 = remaining_rows.copy()
remaining_rows2.rename(columns = {'x':'y','y':'x'}, inplace = True)
remaining_rows = pd.concat([remaining_rows, remaining_rows2])

remaining_rows

       x      y  frequency
0   bill  terra          1
0  terra   bill          1

## Yes! This is the piece that I can concat onto the other data frame so 
that I have a complete Edge list with a frequency column for each edge 
A=>B and B=>A.  After concatenating, remember to specify 
reset_index=True

相关问题 更多 >