使用自定义函数与apply从单列创建多列

0 投票
1 回答
41 浏览
提问于 2025-04-14 17:05

我正在使用这个数据集来做一个小项目。通过数据集中的“Route”这一列,我想创建三个新列——出发地、目的地和中转地。

目前,我正在用一个自定义函数和正则表达式来识别这些信息并创建新列。你能告诉我我在使用apply方法或者这个函数的写法上有什么问题吗?

def travel_route(row):
# layover_patt = r"(?P<source>[^,]+)\s+to\s+(?P<destination>[^,]+)\s+via\s+(?P<layover>[^.]+)"
# direct_patt = r"(?P<source>[^,]+)\s+to\s+(?P<destination>[^,]+)"
text = row["Route"]
# text = row
if "via" in text:
    pattern = r"(?P<source>[^,]+)\s+to\s+(?P<destination>[^,]+)\s+via\s+(?P<layover>[^.]+)"
    source = match.group("source")
    destination = match.group("destination")
    layover = match.group("layover")
    return pd.Series({"From":source,
                      "To":destination,
                      "Via":layover})

else:
    pattern = r"(?P<source>[^,]+)\s+to\s+(?P<destination>[^,]+)"
    source = match.group("source")
    destination = match.group("destination")        

    return pd.Series({"From":source,
                      "To":destination,
                      "Via":np.nan})

# df_with_route.apply(travel_route,axis=1)
df_with_route = df_with_route["Route"].apply(travel_route)
#df_with_route.head()

# travel_route("东京到伦敦希思罗经多哈") # travel_route("奥克兰到多哈")

1 个回答

0

你可以使用 .str.split,这样比用 apply 更简洁、更高效:

# example dataframe
df = pd.DataFrame({"Route": ["Dubai to Gatwick via Doha", "Colombo to Doha"]})

# split on ' to ' or ' via '
df[["From", "To", "Via"]] = df["Route"].str.split(" to | via ", expand=True)
                       Route     From       To   Via
0  Dubai to Gatwick via Doha    Dubai  Gatwick  Doha
1            Colombo to Doha  Colombo     Doha  None

撰写回答