我们如何循环遍历文件夹中的文本文件,复制每个文件中的前两行,并转置结果?

2024-03-29 13:13:20 发布

您现在位置:Python中文网/ 问答频道 /正文

我在一个文件夹中有大约3000个文本文件。我想遍历每一行,获取每一行的文件名,然后复制前两行,进行转置,然后在前一个结果下通过下一个结果

一个文件的字段如下所示

IDRSSD   RIAD0497                           RIAD4042                                RIAD4136            RIAD4141                    RIAD4146                               RIAD4461
         ADVERTISING & MARKETING EXPENSES   RENT & OTHER INCOME FR OTHR REAL EST    DIRECTORS FEES      LEGAL FEES & EXPENSES       FDIC DEPOSIT INSURANCE ASSESSMENTS     1ST ITEMIZED AMT OV25% OF ITEM 4078

我想把它变成这个

file                                                                code        field
C:\Users\ryans\Downloads\FFIEC CDR Call Schedule RIE 03312001.txt   IDRSSD  
C:\Users\ryans\Downloads\FFIEC CDR Call Schedule RIE 03312001.txt   RIAD0497    ADVERTISING & MARKETING EXPENSES
C:\Users\ryans\Downloads\FFIEC CDR Call Schedule RIE 03312001.txt   RIAD4042    RENT & OTHER INCOME FR OTHR REAL EST
C:\Users\ryans\Downloads\FFIEC CDR Call Schedule RIE 03312001.txt   RIAD4136    DIRECTORS FEES
C:\Users\ryans\Downloads\FFIEC CDR Call Schedule RIE 03312001.txt   RIAD4141    LEGAL FEES & EXPENSES
C:\Users\ryans\Downloads\FFIEC CDR Call Schedule RIE 03312001.txt   RIAD4146    FDIC DEPOSIT INSURANCE ASSESSMENTS
C:\Users\ryans\Downloads\FFIEC CDR Call Schedule RIE 03312001.txt   RIAD4461    1ST ITEMIZED AMT OV25% OF ITEM 4078

我有一个代码示例,它从每个文件中复制/粘贴前两行,但不进行转置。我认为最终版本的代码应该是这样的

### mapping table for regulatory line items
import pandas as pd
import csv
import glob
import os

# Use a list here rather than a dataframe
results=[]
filelist = glob.glob("C:\\Users\\ryans\\Downloads\\*.txt")
number_of_lines = 2

for filename in filelist:
    with open(filename) as myfile:
        lines = myfile.readlines() # you can add strip() or other methods here
        file_lines = []
        print(file_lines)
        for line in lines[:2]:
            df = pd.DataFrame(lines[:2])
            transposed = df.T
            file_lines.append(transposed)
        results.append([filename, *file_lines])
        
# You can build a dataframe from that list at the end if you desire
results_df = pd.DataFrame.from_records(results, columns=['filename', 'file_lines_1', 'file_lines_2'])

但这里有些不对劲。它似乎产生了一堆空列表。不知道这里发生了什么。有没有想过如何才能得到我想要的结果?谢谢


Tags: importtxtdownloadscallusersresultsfileschedule
1条回答
网友
1楼 · 发布于 2024-03-29 13:13:20
for filename in filelist:
    with open(filename) as myfile:
        lines = myfile.readlines() # you can add strip() or other methods here
        file_lines = []
        for line in lines[:2]:
            file_lines.append(line)
        results.append([filename, *file_lines])

_行数=2<;-这是一个不需要的变量

相关问题 更多 >