将data.frame从宽格式改为长格式

2024-04-19 04:51:47 发布

您现在位置:Python中文网/ 问答频道 /正文

我的data.frame无法从宽表转换为长表。 目前看来:

Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246

现在我想把这个data.frame转换成一个长的data.frame。 像这样的:

Code Country        Year    Value
AFG  Afghanistan    1950    20,249
AFG  Afghanistan    1951    21,352
AFG  Afghanistan    1952    22,532
AFG  Afghanistan    1953    23,557
AFG  Afghanistan    1954    24,555
ALB  Albania        1950    8,097
ALB  Albania        1951    8,986
ALB  Albania        1952    10,058
ALB  Albania        1953    11,123
ALB  Albania        1954    12,246

我已经看过并尝试过使用melt()reshape()函数 就像有些人在类似的问题中提出的那样。 然而,到目前为止,我只得到混乱的结果。

如果可能的话,我想用reshape()函数来完成,因为 它看起来有点好处理。


Tags: 函数datavaluecodeyearframecountryalb
3条回答

三种替代解决方案:

1)带

您可以使用与reshape2包中相同的melt函数(这是一个经过扩展和改进的实现)。^来自data.table的{}还有比来自reshape2melt函数更多的参数。例如,还可以指定变量列的名称:

library(data.table)
long <- melt(setDT(wide), id.vars = c("Code","Country"), variable.name = "year")

它给出:

> long
    Code     Country year  value
 1:  AFG Afghanistan 1950 20,249
 2:  ALB     Albania 1950  8,097
 3:  AFG Afghanistan 1951 21,352
 4:  ALB     Albania 1951  8,986
 5:  AFG Afghanistan 1952 22,532
 6:  ALB     Albania 1952 10,058
 7:  AFG Afghanistan 1953 23,557
 8:  ALB     Albania 1953 11,123
 9:  AFG Afghanistan 1954 24,555
10:  ALB     Albania 1954 12,246

一些替代符号:

melt(setDT(wide), id.vars = 1:2, variable.name = "year")
melt(setDT(wide), measure.vars = 3:7, variable.name = "year")
melt(setDT(wide), measure.vars = as.character(1950:1954), variable.name = "year")

2)与

library(tidyr)
long <- wide %>% gather(year, value, -c(Code, Country))

一些替代符号:

wide %>% gather(year, value, -Code, -Country)
wide %>% gather(year, value, -1:-2)
wide %>% gather(year, value, -(1:2))
wide %>% gather(year, value, -1, -2)
wide %>% gather(year, value, 3:7)
wide %>% gather(year, value, `1950`:`1954`)

3)与

library(reshape2)
long <- melt(wide, id.vars = c("Code", "Country"))

给出相同结果的一些替代符号:

# you can also define the id-variables by column number
melt(wide, id.vars = 1:2)

# as an alternative you can also specify the measure-variables
# all other variables will then be used as id-variables
melt(wide, measure.vars = 3:7)
melt(wide, measure.vars = as.character(1950:1954))

注:

  • 已失效。只有必要的改变,以保持它的起重机将作出。(source
  • 如果要排除NA值,可以将na.rm = TRUE添加到melt函数以及gather函数中。

数据的另一个问题是R将这些值作为字符值读取(这是数字中,的结果)。你可以用gsubas.numeric修复它:

long$value <- as.numeric(gsub(",", "", long$value))

或者直接用data.tabledplyr

# data.table
long <- melt(setDT(wide),
             id.vars = c("Code","Country"),
             variable.name = "year")[, value := as.numeric(gsub(",", "", value))]

# tidyr and dplyr
long <- wide %>% gather(year, value, -c(Code,Country)) %>% 
  mutate(value = as.numeric(gsub(",", "", value)))

数据:

wide <- read.table(text="Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246", header=TRUE, check.names=FALSE)

reshape()需要一段时间来适应,就像melt/cast。假设数据帧名为d,这里有一个带整形的解决方案:

reshape(d, 
        direction = "long",
        varying = list(names(d)[3:7]),
        v.names = "Value",
        idvar = c("Code", "Country"),
        timevar = "Year",
        times = 1950:1954)

使用重塑包:

#data
x <- read.table(textConnection(
"Code Country        1950    1951    1952    1953    1954
AFG  Afghanistan    20,249  21,352  22,532  23,557  24,555
ALB  Albania        8,097   8,986   10,058  11,123  12,246"), header=TRUE)

library(reshape)

x2 <- melt(x, id = c("Code", "Country"), variable_name = "Year")
x2[,"Year"] <- as.numeric(gsub("X", "" , x2[,"Year"]))

相关问题 更多 >