Concat两个数据帧,其中行位于特定位置

2024-04-29 22:24:24 发布

您现在位置:Python中文网/ 问答频道 /正文

df1:

id    Stu_Name   Class     Fees
1     Jack       primary   2333
2     mack       primary   2363
3     may        primary   2833
3     Mark       primary   1333
3     John       primary   9333
4     Moon       Secondary 6589 
5     daisy      Secondary 6565 
6     shawn      Secondary 6545 
6     roy        Secondary 6596
9     hary       higher    8526
10    Joy        higher    9654
10    nick       higher    7845
10    julie      higher    9633

df2:

id    Stu_Name   Class     Fees
11    eric       primary   2333
21    fick       primary   2363
42    Moon       Secondary 6589 
56    anki       Secondary 6565 
18    menk       higher    7845
17    rock       higher    9633

df_输出:

id    Stu_Name   Class     Fees
1     Jack       primary   2333
2     mack       primary   2363
11    eric       primary   2333
21    fick       primary   2363
3     may        primary   2833
3     Mark       primary   1333
3     John       primary   9333
4     Moon       Secondary 6589 
5     daisy      Secondary 6565 
42    Moon       Secondary 6589 
56    anki       Secondary 6565
6     shawn      Secondary 6545 
6     roy        Secondary 6596
9     hary       higher    8526
18    menk       higher    7845
17    rock       higher    9633
10    Joy        higher    9654
10    nick       higher    7845
10    julie      higher    9633

我需要用行位置连接两个df,例如: 您可以在df1中看到,类“primary”Id“3”重复了几次, 我需要在df1中的id重复之前追加df2类“primary”行。 其他班级也一样


Tags: nameidjohnmayclassdf1markjack
1条回答
网友
1楼 · 发布于 2024-04-29 22:24:24

我误解了目标。这是一个修改后的答复。这是一个很长的响应,但是工作是用三行代码完成的

首先,创建并组合两个数据帧:

from io import StringIO
import pandas as pd

# create the 2 data frames

data = '''id    Stu_Name   Class     Fees
1     Jack       primary   2333
2     mack       primary   2363
3     may        primary   2833
3     Mark       primary   1333
3     John       primary   9333
4     Moon       Secondary 6589 
5     daisy      Secondary 6565 
6     shawn      Secondary 6545 
6     roy        Secondary 6596
9     hary       higher    8526
10    Joy        higher    9654
10    nick       higher    7845
10    julie      higher    9633
'''
df1 = pd.read_csv(StringIO(data), sep='\s+', engine='python')


data = '''id    Stu_Name   Class     Fees
11    eric       primary   2333
21    fick       primary   2363
42    Moon       Secondary 6589 
56    anki       Secondary 6565 
18    menk       higher    7845
17    rock       higher    9633
'''
df2 = pd.read_csv(StringIO(data), sep='\s+', engine='python')

# combine the 2 data frames
df = pd.concat([df1, df2], ignore_index=True)

现在,创建两个helper列,并进行排序:

# create the 1st helper column (for sorting at end)
# this will group (and sort) primary, Secondary, higher
df['class_num'] = df['Class'].factorize()[0]

# create 2nd helper column (to identify repeated IDs)
df['id_count'] = df.groupby('id')['id'].transform('count')

# if the logic is correct, then drop 'class_num', 'id_count'
df = df.sort_values(['class_num', 'id_count']).set_index('id')

结果如下:

print(df)

   Stu_Name      Class  Fees  class_num  id_count
id                                               
1      Jack    primary  2333          0         1
2      mack    primary  2363          0         1
11     eric    primary  2333          0         1
21     fick    primary  2363          0         1
3       may    primary  2833          0         3
3      Mark    primary  1333          0         3
3      John    primary  9333          0         3
4      Moon  Secondary  6589          1         1
5     daisy  Secondary  6565          1         1
42     Moon  Secondary  6589          1         1
56     anki  Secondary  6565          1         1
6     shawn  Secondary  6545          1         2
6       roy  Secondary  6596          1         2
9      hary     higher  8526          2         1
18     menk     higher  7845          2         1
17     rock     higher  9633          2         1
10      Joy     higher  9654          2         3
10     nick     higher  7845          2         3
10    julie     higher  9633          2         3

原邮政

您可以使用分类类型提供自定义排序顺序:

class_idx = pd.CategoricalIndex(
    categories = ['primary', 'Secondary', 'higher'], 
    ordered=True)

df = pd.concat([df1, df2]).astype(
    {'id': 'int', 
     'Stu_Name': 'string', 
     'Class': class_idx, 
     'Fees': 'int'}).sort_values('Class')

print(df.head())

   id Stu_Name    Class  Fees
0   1     Jack  primary  2333
1   2     mack  primary  2363
2   3      may  primary  2833
3   3     Mark  primary  1333
4   3     John  primary  9333

相关问题 更多 >