如何从excel数据表创建多个dataframe

2024-04-29 03:18:11 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用pandas库从excel电子表格中提取了此数据框, 在获得所需的列和, 我有这样的表格格式

    REF PLAYERS
0   103368  Andrés Posada Sanmiguel
1   300552  Diego Posada Sanmiguel
2   103304  Roberto Motta Stanziola
3   NaN NaN
4   REF PLAYERS
5   1047012 ANABELLA EISMANN DE AMAYA
6   104701  FERNANDO ENRIQUE AMAYA CASTRO
7   103451  AUGUSTO ANTONIO ALVARADO AZCARRAGA
8   103484  Kevin Adrian Villarreal Kam
9   REF PLAYERS
10  NaN NaN
11  NaN NaN
12  NaN NaN
13  NaN NaN
14  REF PLAYERS
15  NaN NaN
16  NaN NaN
17  NaN NaN
18  NaN NaN
19  REF PLAYERS

我想创建多个数据帧,将每一行[['REF','PLAYERS']]转换为新的数据帧列。 欢迎大家提出建议,我也需要保留空白空间。一个新手


Tags: 数据refpandas格式nanexcel表格电子表格
2条回答

要使其工作,必须首先以不同的方式从文件中读取数据帧:在pd.read_excel()函数中设置参数header=None。因为现在你们的专栏被称为“REF”和“PLAYERS”,但我们想按他们分组

然后第一个列名可能是“0”,第一行如下所示,其中df是数据帧的名称:

# Set unique index for each group
df["group_id"] = (df[0] == "REF").cumsum()

解决方案:

# Set unique index for each group
df["group_id"] = (df["name_of_first_column"] == "REF").cumsum()

# Iterate over groups
dataframes = []
for name, group in df.groupby("group_id"):
    df_ = group
    # promote 1st row to column name
    df_.columns = df_.iloc[0]
    # and drop it
    df_ = df_.iloc[1:]
    # drop index column
    df_ = df_[["REF", "PLAYERS"]]
    # append to the list of dataframes
    dataframes.append(df_)

所有多个数据帧现在都存储在一个数组dataframes

您可以使用np.splitdataframe拆分为相等的长度(在本例中,每个df对应4行)

由于每个数据帧需要4行,因此可以将其拆分为5个不同的df

import numpy as np
dfs = [df.loc[idx] for idx in np.split(df.index,5)]

然后创建各个数据帧:

df1 = dfs[1]
df1

                                  REF PLAYERS
4                                 REF PLAYERS
5           1047012 ANABELLA EISMANN DE AMAYA
6       104701  FERNANDO ENRIQUE AMAYA CASTRO
7  103451  AUGUSTO ANTONIO ALVARADO AZCARRAGA


df2 = dfs[2]
df2
                            REF PLAYERS
8   103484  Kevin Adrian Villarreal Kam
9                           REF PLAYERS
10                              NaN NaN
11                              NaN NaN

相关问题 更多 >