如何在xlsx中选择一个表，其中包含多个带Pandas的表

Nomes ID Endereço Pedro 1 Rua X Joao 1 Rua Y Maria 2 Rua X Modelo ID Tipo A 20 XYZ B 30 XYS C 40 XQZ Cor Modelo Classe Local Azul A 4 Casa Vermelho A 4 Casa Roxo A 5 Casa Lilas A X Casa

2条回答

网友

1楼 · 编辑于 2024-05-23 21:48:30

这是一个非常简单的方法。由于您需要第二个表，并且每个表中都有空格，因此我们可以分步骤解决此问题：

1-读入整个数据帧

< p > 2查找空白行

3 -在空白行

之间读取

import pandas as pd

# read in dataframe
# using skip_blank_lines=False makes sure to read all the nulls
# using usecols ensures we only use the first three columns
df = pd.read_excel(r'path_to_your.xlsx'
                   , skip_blank_lines=False, usecols='A:C')
      A    B    C
0     1    4    7
1     2    5    8
2     3    6    9
3   NaN  NaN  NaN
4     E    F    G
5     1    2    3
6     4    5    6
7   NaN  NaN  NaN
8     H    I    J
9     a    d    g
10    b    e    h
11    c    f    i
# get index of first entire NULL row, add 2 to correct skip these rows
first_val = df.loc[df.isna().all(axis=1)].index[0]+2

# re-read the file. Skiping the rows we just found the nulls for
df2 = pd.read_excel(r'path_to_your.xlsx', skiprows=first_val)

    E   F   G
0   1   2   3
1   4   5   6
2   NaN NaN NaN
3   H   I   J
4   a   d   g
5   b   e   h
6   c   f   i

# now just read up to the blank line
# using iloc, we use the first row (0) and go up until the first blank line
# since the first blank line is the end of the dataframe it will return our answer
df2.iloc[0:df2.loc[df2.isna().all(axis=1)].index[0]]

    E   F   G
0   1   2   3
1   4   5   6

网友

2楼 · 编辑于 2024-05-23 21:48:30

以下脚本将在单个excel工作表中为每个表创建单独的数据框：

import pandas as pd
import numpy as np    

# read data from excel:
data = pd.read_excel('path/to/my/file.xlsx', header=None)

# split on rows with NaN (blank rows):
df_list = np.split(data, data[data.isnull().all(1)].index)

# iterate over each dataframe:
for df in df_list:

    # remove NaNs:
    df = df.dropna()

    # set first row as column headers:
    new_header = df.iloc[0] 
    df = df[1:] 
    df.columns = new_header 

    # clean up dataframe:
    df = df.reset_index()
    df = df.drop(['index'], axis=1)
    df.columns.name = ''

    # print results:
    print(df)
    print()

将输出：

   Nomes                   ID Endereço
0  Pedro  1900-01-01 00:00:00    Rua X
1   Joao  1900-01-01 00:00:00    Rua Y
2  Maria  1900-01-02 00:00:00    Rua X

  Modelo  ID Tipo
0      A  20  XYZ
1      B  30  XYS
2      C  40  XQZ

   Cor Modelo Classe Local
0      Azul A      4  Casa
1  Vermelho A      4  Casa
2      Roxo A      5  Casa
3      Lilas       X  Casa

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何在xlsx中选择一个表，其中包含多个带Pandas的表

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >