连接datafram中的行

2024-04-24 07:59:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我的数据帧结构如下:

Column A  Column B

1          A  
1          B  
1          C  
1          D  
2          B  
2          C  
2          D  
2          E 

我想连接属于列a中特定值的所有行

我希望最终输出像这样:

Column A Column B Column C  
1        A        ABCD    
1        B        ABCD  
1        C        ABCD  
1        D        ABCD  
2        B        BCDE  
2        C        BCDE  
2        D        BCDE  
2        E        BCDE   

如何在R/Python中执行此操作?你知道吗

谢谢


Tags: 数据column结构abcdbcde
2条回答

如@Sotos在注释中建议的,在baseR中使用一行。确保dfColumnB对于此解决方案是character而不是factor。你知道吗

with(df, ave(ColumnB, ColumnA, FUN = function(i) paste(i, collapse = '')))

另一种碱性溶液:

df$ColumnC<-rep(unlist(by(df,INDICES = df$ColumnA,
function(t){paste(t$ColumnB,collapse = "")},simplify = F)),each=4)

>df
#ColumnA ColumnB ColumnC
#1       1       a    abcd
#2       1       b    abcd
#3       1       c    abcd
#4       1       d    abcd
#5       2       b    bcde
#6       2       c    bcde
#7       2       d    bcde
#8       2       e    bcde

R中,我们可以使用dplyr。按“ColumnA”分组后,paste删除“ColumnB”的内容,并用mutate创建一个新列

library(dplyr)
df1 %>%
     group_by(ColumnA) %>% 
     mutate(ColumnC = paste(ColumnB, collapse=""))
# A tibble: 8 x 3
# Groups:   ColumnA [2]
#  ColumnA ColumnB ColumnC
#    <int>   <chr>   <chr>
#1       1       A    ABCD
#2       1       B    ABCD
#3       1       C    ABCD
#4       1       D    ABCD
#5       2       B    BCDE
#6       2       C    BCDE
#7       2       D    BCDE
#8       2       E    BCDE

或者另一个选项是data.table

library(data.table)
setDT(df1)[,  ColumnC := paste(ColumnB, collapse=""), by = ColumnA]

数据

df1 <- structure(list(ColumnA = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), ColumnB = c("A", 
 "B", "C", "D", "B", "C", "D", "E")), .Names = c("ColumnA", "ColumnB"
 ), class = "data.frame", row.names = c(NA, -8L))

如果我们需要python,那么

>>> import pandas as pd;
>>> df1 = pd.read_clipboard()
>>> df1
#   ColumnA ColumnB
#1        1       A
#2        1       B
#3        1       C
#4        1       D
#5        2       B
#6        2       C
#7        2       D
#8        2       E
>>> df1['ColumnC'] = df1.groupby('ColumnA')['ColumnB'].transform(lambda x: ''.join(x))
>>> df1
#   ColumnA ColumnB ColumnC
#1        1       A    ABCD
#2        1       B    ABCD
#3        1       C    ABCD
#4        1       D    ABCD
#5        2       B    BCDE
#6        2       C    BCDE
#7        2       D    BCDE
#8        2       E    BCDE

相关问题 更多 >