如何创建一个函数,允许我获取数据帧的第一行并生成二维元组列表?

2024-05-16 18:40:56 发布

您现在位置:Python中文网/ 问答频道 /正文

 df100=a[['genres','imdb_score']]
 df100
            genres                imdb_score
 0  Action|Adventure|Fantasy|Sci-Fi 7.9
 1  Action|Adventure|Fantasy        7.1
 2  Action|Adventure|Thriller       6.8
 3  Action|Thriller                 8.5
 4  Documentary                     7.1
       ...  ... ...
5038 Comedy|Drama                   7.7
5039 Crime|Drama|Mystery|Thriller   7.5
5040 Drama|Horror|Thriller          6.3
5041 Comedy|Drama|Romance           6.3
5042 Documentary                    6.6

def tuples(p):

   t= [(p[0], p[1]) for p[0], p[1] in zip(df100.genres, df100.imdb_score) for p[0] in p[0].split('|')]

     return t

tuples(df100.loc[0,['genres','imdb_score']])

因此,我创建了上面的数据框架,其中的列是流派和imdb分数。然后,我创建了一个函数tuples(),该函数将流派列拆分为每个独特的流派,然后在其旁边添加imdb_分数。(如下图所示),然后我应用了tuples(df100.loc[0,['genres','imdb_分数]])所示的函数,希望得到下面所示的数据帧第一行的2d元组。然而,我最终得到了数据帧中所有行的一个完整列表,而不仅仅是第一行。有谁能帮我修改一下这个函数,让我在第一行使用它,然后分别将它应用到整个数据帧上

[('Action',7.9),('Adventure',7.9),('Fantasy',7.9),('Sci-Fi',7.9)]

Tags: 数据函数action分数fantasyimdbscoresci
3条回答

您需要每行的元组列表;您可以使用zip和itertoolsproduct实现这一点。您应该能够根据需要对其进行修改

from itertools import product
                 #create a cross join with product 
                 #entry from df.imdb_score is wrapped in a list
                 #else the string will be used and the cross join
                 #will produce a combo of individual strings and genre
df['pairing'] = [list(product(genre,[score]))
                 for genre,score in
                  #split the data, before pairing 
                 zip([ent.split('|') for ent in df.genres],df.imdb_score)]
df.head()
         genres                     imdb_score  pairing
0   Action|Adventure|Fantasy|Sci-Fi 7.9 [(Action, 7.9), (Adventure, 7.9), (Fantasy, 7....
1   Action|Adventure|Fantasy        7.1 [(Action, 7.1), (Adventure, 7.1), (Fantasy, 7.1)]
2   Action|Adventure|Thriller       6.8 [(Action, 6.8), (Adventure, 6.8), (Thriller, 6...
3   Action|Thriller                 8.5 [(Action, 8.5), (Thriller, 8.5)]
4   Documentary                     7.1 [(Documentary, 7.1)]

IIUC,使用explode和itertuples,我们可以从数据帧创建元组

s = df['genres'].str.split('|').explode().to_frame()

s['score'] = s.index.map(df['imdb_score'])

t = list(s.itertuples(index=False,name=None))

print(t)

[('Action', 7.9),
 ('Adventure', 7.9),
 ('Fantasy', 7.9),
 ('Sci-Fi', 7.9),
 ('Action', 7.1),
 ('Adventure', 7.1),
 ('Fantasy', 7.1),
 ('Action', 6.8),
 ('Adventure', 6.8),
 ('Thriller', 6.8),
 ('Action', 8.5),
 ('Thriller', 8.5),
 ('Documentary', 7.1),
 ('Comedy', 7.7),
 ('Drama', 7.7),
 ('Crime', 7.5),
 ('Drama', 7.5),
 ('Mystery', 7.5),
 ('Thriller', 7.5),
 ('Drama', 6.3),
 ('Horror', 6.3),
 ('Thriller', 6.3),
 ('Comedy', 6.3),
 ('Drama', 6.3),
 ('Romance', 6.3)]

如果需要以特定行为目标,则此函数使用isin将实现以下功能:

def tuple_row(frame,row_num):
    s = frame['genres'].str.split('|').explode().to_frame()
    s['score'] = s.index.map(frame['imdb_score'])
    return list(s[s.index.isin([row_num])].itertuples(index=False,name=None))


tuple_row(df,5)
[('Comedy', 7.7), ('Drama', 7.7)]

如果希望每一行都包含在嵌套的排序列表中

l = [list(i.itertuples(name=None,index=False)) for _,i in s.groupby(level=0)]

[[('Action', 7.9), ('Adventure', 7.9), ('Fantasy', 7.9), ('Sci-Fi', 7.9)],
 [('Action', 7.1), ('Adventure', 7.1), ('Fantasy', 7.1)],
 [('Action', 6.8), ('Adventure', 6.8), ('Thriller', 6.8)],
 [('Action', 8.5), ('Thriller', 8.5)],
 [('Documentary', 7.1)],
 [('Comedy', 7.7), ('Drama', 7.7)],
 [('Crime', 7.5), ('Drama', 7.5), ('Mystery', 7.5), ('Thriller', 7.5)],
 [('Drama', 6.3), ('Horror', 6.3), ('Thriller', 6.3)],
 [('Comedy', 6.3), ('Drama', 6.3), ('Romance', 6.3)],
 [('Documentary', 6.6)]]
import pandas as pd
from datetime import datetime


def get_tuples(p):    
    t = [(k, p['imdb_score']) for k in p['genres'].split('|')]

    return t


df100 = pd.DataFrame({'genres': ['Action|Adventure|Fantasy|Sci-Fi', 'Action|Adventure|Fantasy', 'Action|Adventure|Thriller'],
                   'imdb_score': [7.9, 7.1, 6.8]})

x = get_tuples(df100.loc[0, ['genres','imdb_score']])

print(x)

输出:

[('Action', 7.9), ('Adventure', 7.9), ('Fantasy', 7.9), ('Sci-Fi', 7.9)]

相关问题 更多 >