在Python/dataframe中创建列,显示Excel源中相应列中的数据示例

2024-05-16 05:40:43 发布

您现在位置:Python中文网/ 问答频道 /正文

目标:创建一列:1。显示三个最常见的条目,或2:在源excel文件中显示列的三个随机离散条目。我知道那是一大堆话,让我来示范一下。在下面的示例中,我很难编写代码来生成下面第二个表(示例值列)最右边的列中的内容

给定此源excel文件(NBA.xlsx):

^{tb1}$

我想:

^{tb2}$

我将在下面提供代码。很长,但我想让你知道我在做什么。我想我只需要在最后两部分中添加一些代码片段,用#符号表示:

### Setting up
import pandas as pd
import os
import glob

###Setting working directory
path = os.getcws()
files = os.listdir(path)

### Prep to get all files
from os import listdir
from os.path import isfile, join

### Reading only excel files in folder
FileList_xlsx = [f for f in files if f[-4:] == "xlsx"]

# Initializing empty data frame
df = pd.DataFrame()

# Initializing Aggregate List for Column Names
Agg_ColumnNames = []

# Initializing Aggregate List for File Names associated with Column Names
FileNames = []

# Initializing empty dataframe for Data Type
datatype_df = pd.DataFrame()

# Read excel into Python and prepping the dataframe for DATA TYPE ONLY
n = 1
for f in FileList_xlsx:
    test_df = pd.read_excel(f)
    test_df = test_df.add_prefix(f'File{n}.')
    datatype_df = datatype_df.append(test_df, ignore_index=True, sort = False)
    n+=1

# Getting the data type of the column

dataTypeSeries = datatype_df.dtypes

# Loop over list of Excel files

for f in FileList_xlsx:
    ReadXlsx = pd.read_excel(f)
    ColumnNames = list(ReadXlsx.columns.values)

    for a in ColumnNames:
        Agg_ColumnNames.append(a)
    
    for a in ColumnNames:
        FileNames.append(f)

# Create final dataframe
final = {'Column_Name': Agg_ColumnNames, 'File_Name': FileNames, 'Data_Type': dataTypeSeries}

Tags: path代码intestimportdfforos