我正在制作一个电影推荐系统。我需要一个python代码,将从excel表导入的数据转换为set格式(如下所示)。你知道吗
从excel工作表导入数据的代码:
import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile
df = pd.read_excel('project.xlsx', sheetname='Sheet1')
df.head(40)
我得到的输出:
USER MOVIE RATINGS
0 Julia Roberts Shrek 2.5
1 NaN V for Vendetta 3.5
2 NaN Pretty Woman 3.0
3 NaN Star Wars 3.5
4 NaN While You Were Sleeping 2.5
5 NaN Phone Booth 3.0
6 Drew Barrymore Shrek 3.0
7 NaN V for Vendetta 3.5
8 NaN Pretty Woman 1.5
9 NaN Star Wars 5.0
10 NaN Phone Booth 3.0
11 NaN While You Were Sleeping 3.5
12 Kate Winslet Shrek 2.5
13 NaN V for Vendetta 3.0
14 NaN Star Wars 3.5
15 NaN Phone Booth 4.0
16 Tom Hanks While You Were Sleeping 2.5
17 NaN V for Vendetta 3.5
18 NaN Pretty Woman 3.0
19 NaN Star Wars 4.0
20 NaN Phone Booth 4.5
....
......
......
......
从这里我需要这样的输出:
dataset={
'Julia Roberts': {
'Shrek': 2.5,
'I am Legend':3.0,
'V for Vendetta': 3.5,
'Pretty Woman': 0,
"My Sister's Keeper":5.0,
'Star Wars': 3.5,
'Me Before You': 3.0,
'While You Were Sleeping': 2.5,
'Phone Booth': 3.0},
'Drew Barrymore': {'Shrek': 3.0,
'V for Vendetta': 3.5,
'Pretty Woman': 1.5,
"My Sister's Keeper":4.0,
'Star Wars': 5.0,
'Phone Booth': 3.0,
'While You Were Sleeping': 3.5},
'Tom Hanks': {'V for Vendetta': 3.5,
'Pretty Woman': 3.0,
'Phone Booth': 4.5,
'Star Wars': 4.0,
'While You Were Sleeping': 2.5,
'I am Legend':3.5},
'Sandra Bullock': {'Shrek': 3.0,
'V for Vendetta': 4.0,
'Pretty Woman': 2.0,
'Star Wars': 3.0,
'I am Legend':4.5,
"My Sister's Keeper":3.5,
'Phone Booth': 3.0,
'While You Were Sleeping': 2.0}
}
我正在使用的代码(但显示错误):
max_nb_row = 0
for sheet in df.sheets():
max_nb_row = max(max_nb_row, sheet.nrows)
for row in range(max_nb_row) :
for sheet in df.sheets() :
if row < sheet.nrows :
print (sheet.row(row))
你可以用这个难以理解的一行字:
为了可视化正在发生的事情,我们将使用以下较小的数据帧:
一步一步地,会发生这样的事情:
使用^{} 将
user
列中的NaN
值替换为上面的名称。你知道吗使用
groupby('user')
按用户分组数据使用
apply(lambda x: dict(zip(x['movie'], x['ratings']))
创建{movie: rating}
对的dict。你知道吗在最终的数据帧上调用
to_dict()
,以获得所需的结果。你知道吗相关问题 更多 >
编程相关推荐