使用自定义函数与apply从单列创建多列
我正在使用这个数据集来做一个小项目。通过数据集中的“Route”这一列,我想创建三个新列——出发地、目的地和中转地。
目前,我正在用一个自定义函数和正则表达式来识别这些信息并创建新列。你能告诉我我在使用apply方法或者这个函数的写法上有什么问题吗?
def travel_route(row):
# layover_patt = r"(?P<source>[^,]+)\s+to\s+(?P<destination>[^,]+)\s+via\s+(?P<layover>[^.]+)"
# direct_patt = r"(?P<source>[^,]+)\s+to\s+(?P<destination>[^,]+)"
text = row["Route"]
# text = row
if "via" in text:
pattern = r"(?P<source>[^,]+)\s+to\s+(?P<destination>[^,]+)\s+via\s+(?P<layover>[^.]+)"
source = match.group("source")
destination = match.group("destination")
layover = match.group("layover")
return pd.Series({"From":source,
"To":destination,
"Via":layover})
else:
pattern = r"(?P<source>[^,]+)\s+to\s+(?P<destination>[^,]+)"
source = match.group("source")
destination = match.group("destination")
return pd.Series({"From":source,
"To":destination,
"Via":np.nan})
# df_with_route.apply(travel_route,axis=1)
df_with_route = df_with_route["Route"].apply(travel_route)
#df_with_route.head()
# travel_route("东京到伦敦希思罗经多哈") # travel_route("奥克兰到多哈")
1 个回答
0
你可以使用 .str.split
,这样比用 apply
更简洁、更高效:
# example dataframe
df = pd.DataFrame({"Route": ["Dubai to Gatwick via Doha", "Colombo to Doha"]})
# split on ' to ' or ' via '
df[["From", "To", "Via"]] = df["Route"].str.split(" to | via ", expand=True)
Route From To Via
0 Dubai to Gatwick via Doha Dubai Gatwick Doha
1 Colombo to Doha Colombo Doha None