如何从excel表格中获取数据并以设置的格式输出?

2024-04-26 18:34:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在制作一个电影推荐系统。我需要一个python代码,将从excel表导入的数据转换为set格式(如下所示)。你知道吗

enter image description here

从excel工作表导入数据的代码:

import pandas as pd
from pandas import ExcelWriter
from pandas import ExcelFile

df = pd.read_excel('project.xlsx', sheetname='Sheet1')
df.head(40)

我得到的输出:

        USER       MOVIE    RATINGS
0   Julia Roberts   Shrek   2.5
1   NaN         V for Vendetta  3.5
2   NaN         Pretty Woman    3.0
3   NaN            Star Wars    3.5
4   NaN    While You Were Sleeping  2.5
5   NaN     Phone Booth 3.0
6   Drew Barrymore  Shrek   3.0
7   NaN       V for Vendetta    3.5
8   NaN     Pretty Woman    1.5
9   NaN        Star Wars    5.0
10  NaN      Phone Booth    3.0
11  NaN   While You Were Sleeping   3.5
12  Kate Winslet       Shrek    2.5
13  NaN       V for Vendetta    3.0
14  NaN        Star Wars    3.5
15  NaN       Phone Booth   4.0
16  Tom Hanks   While You Were Sleeping 2.5
17  NaN           V for Vendetta    3.5
18  NaN         Pretty Woman    3.0
19  NaN         Star Wars   4.0
20  NaN     Phone Booth 4.5
....
......
......
......

enter image description here

从这里我需要这样的输出:

dataset={
 'Julia Roberts': {
 'Shrek': 2.5,
 'I am Legend':3.0,
 'V for Vendetta': 3.5,
 'Pretty Woman': 0,
 "My Sister's Keeper":5.0,
 'Star Wars': 3.5,
 'Me Before You': 3.0,
 'While You Were Sleeping': 2.5,
 'Phone Booth': 3.0},

 'Drew Barrymore': {'Shrek': 3.0,
 'V for Vendetta': 3.5,
 'Pretty Woman': 1.5,
 "My Sister's Keeper":4.0,
 'Star Wars': 5.0,
 'Phone Booth': 3.0,
 'While You Were Sleeping': 3.5},


 'Tom Hanks': {'V for Vendetta': 3.5,
 'Pretty Woman': 3.0,
 'Phone Booth': 4.5,
 'Star Wars': 4.0,
 'While You Were Sleeping': 2.5,
 'I am Legend':3.5},

 'Sandra Bullock': {'Shrek': 3.0,
 'V for Vendetta': 4.0,
 'Pretty Woman': 2.0,
 'Star Wars': 3.0,
 'I am Legend':4.5,
 "My Sister's Keeper":3.5, 
 'Phone Booth': 3.0,
 'While You Were Sleeping': 2.0}
}

我正在使用的代码(但显示错误):

max_nb_row = 0
for sheet in df.sheets():
  max_nb_row = max(max_nb_row, sheet.nrows)

for row in range(max_nb_row) :
  for sheet in df.sheets() :
    if row < sheet.nrows :
      print (sheet.row(row))

Tags: youforprettyphonenanstarrowwhile
1条回答
网友
1楼 · 发布于 2024-04-26 18:34:23

你可以用这个难以理解的一行字:

df.ffill().groupby('user').apply(lambda x: dict(zip(x['movie'], x['ratings']))).to_dict()

为了可视化正在发生的事情,我们将使用以下较小的数据帧:

>>> df
             user           movie  ratings
0   Julia Roberts           Shrek      2.5
1             NaN  V for Vendetta      3.5
2             NaN    Pretty Woman      3.0
3  Drew Barrymore           Shrek      3.0
4             NaN  V for Vendetta      3.5

一步一步地,会发生这样的事情:

  1. 使用^{}user列中的NaN值替换为上面的名称。你知道吗

                 user           movie  ratings
    0   Julia Roberts           Shrek      2.5
    1   Julia Roberts  V for Vendetta      3.5
    2   Julia Roberts    Pretty Woman      3.0
    3  Drew Barrymore           Shrek      3.0
    4  Drew Barrymore  V for Vendetta      3.5
    
  2. 使用groupby('user')按用户分组数据

  3. 使用apply(lambda x: dict(zip(x['movie'], x['ratings']))创建{movie: rating}对的dict。你知道吗

    user
    Drew Barrymore    {'Shrek': 3.0, 'V for Vendetta': 3.5}
    Julia Roberts     {'Shrek': 2.5, 'V for Vendetta': 3.5, 'Pretty ...
    dtype: object
    
  4. 在最终的数据帧上调用to_dict(),以获得所需的结果。你知道吗

    {'Drew Barrymore': {'Shrek': 3.0, 'V for Vendetta': 3.5},
     'Julia Roberts': {'Pretty Woman': 3.0, 'Shrek': 2.5, 'V for Vendetta': 3.5}}
    

相关问题 更多 >