如何在数据帧中选择由“;”和“,”分隔的字符串中的多个元素?

2024-04-20 05:00:56 发布

您现在位置:Python中文网/ 问答频道 /正文

例如:

数据帧的第1行:name 1, age 1, country 1; name 2, age 2, country 2; name 3, age 3, country 3

数据帧的第二行:name a, age a, country a; name b, age b, country b; name c, age c, country c

我只想选择数据框每行的国家,然后在同一数据框中创建一个新列:

country 1, country 2, country 3

country a, country b, country c

我试过了,但每行只能得到最后一所学校的最后一个国家

df["countries"] = df["school_info"].apply(lambda x: str(x).split(",")[-1].strip())

输出:

country 3

country c

谢谢大家!


Tags: 数据lambdanameinfodfage国家country
2条回答

好的,现在我明白你的要求了

  1. 构建要成为行的tuples的临时list
  2. 使用explode()将列表展开为行
  3. 在每行的tuple中选择值以形成列。出于示例的目的,我选择了所有组件,并保留了原始编码字符串
data = """name 1, age 1, country 1; name 2, age 2, country 2; name 3, age 3, country 3
name a, age a, country a; name b, age b, country b; name c, age c, country c"""

df = pd.DataFrame({"school_info":data.split("\n")})
# df["data_tuple"] = df["school_info"].apply(lambda s: [tuple(t.split(",")) for t in s.split(";")])
df = df.assign(data_tuple=lambda dfa: dfa["school_info"].apply(
    # build a list of tuples - delimiter is ";" each tuple contains (name,age,country)
    lambda s: [tuple(t.split(",")) for t in s.split(";")]))\
    # explode the list and pick out each of the elements of resultant tuple
    .explode("data_tuple").assign(
        name=lambda dfa: dfa["data_tuple"].apply(lambda t: t[0]),
        age=lambda dfa: dfa["data_tuple"].apply(lambda t: t[1]),
        country=lambda dfa: dfa["data_tuple"].apply(lambda t: t[2]),
).drop("data_tuple", axis=1) # this was a temporary contruct drop it

print(df.to_string(index=False))

输出

                                                                  school_info     name     age     country
 name 1, age 1, country 1; name 2, age 2, country 2; name 3, age 3, country 3   name 1   age 1   country 1
 name 1, age 1, country 1; name 2, age 2, country 2; name 3, age 3, country 3   name 2   age 2   country 2
 name 1, age 1, country 1; name 2, age 2, country 2; name 3, age 3, country 3   name 3   age 3   country 3
 name a, age a, country a; name b, age b, country b; name c, age c, country c   name a   age a   country a
 name a, age a, country a; name b, age b, country b; name c, age c, country c   name b   age b   country b
 name a, age a, country a; name b, age b, country b; name c, age c, country c   name c   age c   country c

如果您的行位于一个名为school_info的列中:

df["school_info"].apply(lambda r: ', '.join([c.split(",")[-1].strip() for c in r.split(";")]))

输入:

data = [["name 1, age 1, country 1; name 2, age 2, country 2; name 3, age 3, country 3"],
        ["name a, age a, country a; name b, age b, country b; name c, age c, country c"]]
df = pd.DataFrame(data, columns=['school_info'])

输出:

0    country 1, country 2, country 3
1    country a, country b, country c
Name: school_info, dtype: object

相关问题 更多 >