在数据帧中调整所有行，然后仅基于d进行转换

import pandas as pd from sklearn.metrics import accuracy_score from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder, MinMaxScaler pda = pd.DataFrame({"input":pd.Series(["abc23d,efgh45,jklfj4","dfer56,efgh45,jklh45","abc23d,efgh66,jklfj7","abc23d,efgh45,jklfj4"]), "label": pd.Series([1,2,3,1])}) label_encoder = LabelEncoder() pda["encoded_input"] = pda["input"].apply(lambda x:x.split(",")).apply(label_encoder.fit_transform)

input label encoded_input 0 abc23d,efgh45,jklfj4 1 [0, 1, 2] 1 dfer56,efgh45,jklh45 2 [0, 1, 2] 2 abc23d,efgh66,jklfj7 3 [0, 1, 2] 3 abc23d,efgh45,jklfj4 1 [0, 1, 2]

1条回答

网友

1楼 · 发布于 2024-04-20 15:28:58

我将使用

pda['ecode']=pda.input.str.split(',',expand=True).T.rank().T.values.tolist()
pda
                  input  label            ecode
0  abc23d,efgh45,jklfj4      1  [1.0, 2.0, 3.0]
1  dfer56,efgh45,jklh45      2  [1.0, 2.0, 3.0]
2  abc23d,efgh66,jklfj7      3  [1.0, 2.0, 3.0]
3  abc23d,efgh45,jklfj4      1  [1.0, 2.0, 3.0]

更新

pda['ecode']=pda.input.str.split(',').explode().astype('category').cat.codes.groupby(level=0).apply(list)
pda
                  input  label      ecode
0  abc23d,efgh45,jklfj4      1  [0, 2, 4]
1  dfer56,efgh45,jklh45      2  [1, 2, 6]
2  abc23d,efgh66,jklfj7      3  [0, 3, 5]
3  abc23d,efgh45,jklfj4      1  [0, 2, 4]

相关问题更多 >

编程相关推荐

热门问题

热门文章