抓取名称中包含当前年份和过去5年的文件，并连接到1个datafram中

import pandas as pd import datetime import os import glob qms = os.path.join('X:', 'JY', 'Analyst', 'Data') today = datetime.datetime.today() #Pulling all files and concatenating, needs to pull only last 5 + current warranty_files = glob.glob(os.path.join(qms, '*.csv')) warranty_list = [] for file_ in warranty_files: df = pd.read_csv(file_,index_col=None, header=0) warranty_list.append(df) warranty = pd.concat(warranty_list) # def get_warranty(years): #want this to be the start of function

1条回答

网友

1楼 · 发布于 2024-05-16 21:31:38

如果您需要进行特定的选择，glob也允许您这样做

我制作了一个包含3个文本文件的文件夹，标签为Data2010, Data2011, Data2013，我可以选择2010年之后的所有文件，如下所示：

files = glob.glob("/path/to/folder/"+"Data201[1-9].txt")
for file in files:
    print(file)

换句话说，您应该能够使用regex进一步自定义文件选择。一旦选择了正确数量的文件，就可以将它们连接成pd.DataFrame

在我上面的例子中，抓住当前和过去五年是这样的，"Data201[3-8].txt"。如果文件名的该部分前面有一些文本，请添加星号*："*Data201[3-8].txt"。如果有不清楚的地方请告诉我

编辑：OP要求根据当前年份自动选择他们的文件。这里有一个方法可以做到这一点。试一试

path = "C:\\Users\\David\\Desktop\\test\\"
def get_files(path,n=5):
    files = [] #list to append to
    current_year = datetime.datetime.today().year #current year
    last_n_years = [str(current_year-i) for i in range(0,n+1)] #list last 5 years
    for year in last_n_years:
        files_ = glob.glob(path + "*Data%s.csv" % year) #grab csv files per year
        if files_: #if files_ is not []
            for f in files_: #for file in files_
                files.append(f) #append each file
    return files

files = get_files(path,n=5)
print(files)

相关问题更多 >

编程相关推荐

热门问题

热门文章