从列中创建虚拟对象

df['genres'] = df['genres'].str.split(pat='|') df3 = pd.melt(df, id_vars=['id'], value_vars=['genres'], var_name='col_one', value_name='col_two') df3.head() id col_one col_two 0 135397 genres [Action, Adventure, Science Fiction, Thriller] 1 76341 genres [Action, Adventure, Science Fiction, Thriller] 2 262500 genres [Adventure, Science Fiction, Thriller] 3 140607 genres [Action, Adventure, Science Fiction, Fantasy] 4 168259 genres [Action, Crime, Thriller] df4 = df3["col_two"].str.get_dummies(",") df4.head() 'Action' 'Action'] 'Adventure' 'Adventure'] 'Animation' 'Animation'] 'Comedy' 'Comedy'] 'Crime' 'Crime'] ... ['Romance'] ['Science Fiction' ['Science Fiction'] ['TV Movie' ['Thriller' ['Thriller'] ['War' ['War'] ['Western' ['Western'] 0 0 0 1 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 3 0 0 1 0 0 0 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 0 0 0 0

2条回答

网友

1楼 · 编辑于 2024-05-15 23:54:59

可以使用str.translate和str.maketrans删除字符，然后使用get_dummies：

no_bracket = df['col_two'].str.translate(str.maketrans('', '', '[]'))
no_bracket.str.get_dummies(',')

这个post和str.translate的documentation应该提供更多关于参数的信息

网友

2楼 · 编辑于 2024-05-15 23:54:59

dummified列的一个简单的.join应该可以很好地工作。试试这个：

df = df[['id', 'col_one']].join(df['col_two'].str.join('|').str.get_dummies().add_prefix('GENRE_'))

让我知道这是否适合你

相关问题更多 >

编程相关推荐

热门问题

热门文章