根据每个senten的第一个单词，将dataframe列中的字符串列表拆分为新的列

df = pd.DataFrame({"person": [1, 2, 3], "problems": ["body: knee hurts(bad-pain), toes hurt(BIG/MIDDLE); mind: stressed, tired", "soul: missing; mind: can't think; body: feels great(lifts weights), overweight(always bulking), missing a finger", "none"]}) df ╔═══╦════════╦══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗ ║ ║ person ║ problems ║ ╠═══╬════════╬══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣ ║ 0 ║ 1 ║ body: knee hurts(bad-pain), toes hurt(BIG/MIDDLE); mind: stressed, tired ║ ║ 1 ║ 2 ║ soul: missing; mind: can't think; body: feels great(lifts weights), overweight(always bulking), missing a finger ║ ║ 2 ║ 3 ║ none ║ ╚═══╩════════╩══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

╔═══╦════════╦══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╦════════════════════════════════════════════════════════════════════════════════╦═══════════════════════╦═══════════════╗ ║ ║ person ║ problems ║ body ║ mind ║ soul ║ ╠═══╬════════╬══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╬════════════════════════════════════════════════════════════════════════════════╬═══════════════════════╬═══════════════╣ ║ 0 ║ 1 ║ body: knee hurts(bad-pain), toes hurt(BIG/MIDDLE); mind: stressed, tired ║ body: knee hurts(bad-pain), toes hurt(BIG/MIDDLE) ║ mind: stressed, tired ║ NaN ║ ║ 1 ║ 2 ║ soul: missing; mind: can't think; body: feels great(lifts weights), overweight(always bulking), missing a finger ║ body: feels great(lifts weights), overweight(always bulking), missing a finger ║ mind: can't think ║ soul: missing ║ ║ 2 ║ 3 ║ none ║ NaN ║ NaN ║ NaN ║ ╚═══╩════════╩══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╩════════════════════════════════════════════════════════════════════════════════╩═══════════════════════╩═══════════════╝

df.problems.str.extractall(r"(\b(?!(?: \b))[\w\s.()',:/-]+)") +---+-------+--------------------------------------------------------------------------------+ | | | 0 | +---+-------+--------------------------------------------------------------------------------+ | | match | | | 0 | 0 | body: knee hurts(bad-pain), toes hurt(BIG/MIDDLE) | | | 1 | mind: stressed, tired | | 1 | 0 | soul: missing | | | 1 | mind: can't think | | | 2 | body: feels great(lifts weights), overweight(always bulking), missing a finger | | 2 | 0 | none | +---+-------+--------------------------------------------------------------------------------+

df = pd.DataFrame({"person": [1, 2, 3], "problems": ["body: knee hurts(bad-pain), toes hurt(BIG/MIDDLE); mind: stressed, energy: tired", "soul: missing; mind: can't think; body: feels great(lifts weights), overweight(always bulking), missing a finger", "none"]}) ╔═══╦════════╦══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╗ ║ ║ person ║ problems ║ ╠═══╬════════╬══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╣ ║ 0 ║ 1 ║ body: knee hurts(bad-pain), toes hurt(BIG/MIDDLE); mind: stressed, energy: tired ║ ║ 1 ║ 2 ║ soul: missing; mind: can't think; body: feels great(lifts weights), overweight(always bulking), missing a finger ║ ║ 2 ║ 3 ║ none ║ ╚═══╩════════╩══════════════════════════════════════════════════════════════════════════════════════════════════════════════════╝

1条回答

网友

1楼 · 发布于 2024-05-23 17:50:50

虽然不优雅，但它能完成任务：

df['split'] = df.problems.str.split(';')
df['mind'] = df.split.apply(
    lambda x: ''.join([category for category in x if 'mind' in category]))
df['body'] = df.split.apply(
    lambda x: ''.join([category for category in x if 'body' in category]))
df['soul'] = df.split.apply(
    lambda x: ''.join([category for category in x if 'soul' in category]))
df.drop('split', inplace=True)

你可以把它包起来

df[cat] = df.split.apply(lambda x: ''.join([category for category in x if cat in category]))

并在数据帧上为每个cat（例如cats=['mind', 'body', 'soul', 'whathaveyou', 'etc.']）运行它。你知道吗

编辑：

正如@ifly6所指出的，用户输入的字符串中可能有关键字的交叉点。为了安全起见，应该将函数改为

df[cat] = df.split.apply(lambda x: ''.join([category for category in x if category.startswith(cat)]))

相关问题更多 >

编程相关推荐

热门问题

热门文章