将数据帧的数据点转换为列

name sample 1 a Category 1: qwe, asd (line break) Category 2: sdf, erg 2 b Category 2: sdf, erg(line break) Category 5: zxc, eru ... 30 p Category 1: asd, Category PE: 2134, EFDgh, Pdr tke, err

name qwe asd sdf erg zxc eru 2134 EFDgh Pdr tke err 1 a 1 1 1 1 0 0 0 0 0 0 2 b 0 0 1 1 1 1 0 0 0 0 ... 30 p 0 1 0 0 0 0 0 1 1 0

1条回答

网友

1楼 · 发布于 2024-05-16 04:50:56

IIUC您可以使用^{}和regex模式来查找包含3个字符的所有单词，其中negative lookbehind and lookahead表示非字符符号。然后，您可以将获得的列表与^{}连接起来，并使用^{}获得您的假人。然后可以删除额外的列：

df['new'] = df['sample'].str.findall('(?<!\w)\w{3}(?!\w)')
df_dummies = df['new'].str.join('_').str.get_dummies(sep='_')
df = pd.concat([df, df_dummies], axis=1)

In [215]: df['new']
Out[215]:
1    [qwe, asd, sdf, erg]
2    [sdf, erg, zxc, eru]
Name: new, dtype: object

In [216]: df
Out[216]:
  name                                             sample                    new  asd  erg  eru  qwe  sdf  zxc 
1    a  Category 1: qwe, asd (line break) Category 2: ...   [qwe, asd, sdf, erg]    1    1    0    1    1    0
2    b  Category 2: sdf, erg(line break) Category 5: z...   [sdf, erg, zxc, eru]    0    1    1    0    1    1

删除额外列后，您将得到以下结果：

df = df.drop(['sample', 'new'], axis=1)

In [218]: df
Out[218]:
  name  asd  erg  eru  qwe  sdf  zxc
1    a    1    1    0    1    1    0
2    b    0    1    1    0    1    1

相关问题更多 >

编程相关推荐

热门问题

热门文章

将数据帧的数据点转换为列

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >