当文件具有奇怪的格式时，以数字方式对文本文件的行进行排序

3条回答

网友

1楼 · 编辑于 2024-06-07 09:37:22

你可以这样做

代码

import re

def order_month(month_of_entries):
    '''
        Order lines for a Month of entries
    '''
    # Sort key based upon number in line
    # First line in Month does not have a number, 
    # so key function returns 0 for it so it stays first
    month_of_entries.sort(key=lambda x: int(p.group(0)) if (p:=re.search('\d+', x)) else 0)
            
# Process input file
with open('input.txt', 'r') as file:
    results = []
    months_data = []
    for line in file:
        line = line.rstrip()
        if line:
            months_data.append(line)
        else:
            # blank line
            # Order files for this month
            order_month(months_data)
            results.append(months_data)
            
            # Setup for next month
            months_data = []
    else:
        # Reached end of file
        # Order lines for last month
        if months_data:
            order_entries(months_data)
            results.append(months_data)
               
# Write to output file
with open('output.txt', 'w') as file:
    for i, months_data in enumerate(results):
        # Looping over each month
        for line in months_data:
            file.write(line + '\n')
        # Add blank line if not last month
        if i < len(results) - 1:
            file.write('\n')

输出

**January birthdays:**
**4** - !@Jan
**15** - !@Ralph
**17** - !@Mark

**February birthdays:**
**19** - !@Bill
**27** - !@Steve
**29** - !@Bob

可选，必要时也可对月份进行排序

import re
from itertools import accumulate
from datetime import date
    
def find_day(s, pattern=re.compile(r'\d+')): 
    return 99 if not s.strip() else int(p.group(0)) if (p:=pattern.search(s)) else 0

def find_month(previous, s, pattern = re.compile(fr"^\*\*({'|'.join(months_of_year)})")):
    ' Index of Month in year (i.e. 1-12)'
    return months_of_year.index(p.group(1)) if (p:=pattern.search(s)) else previous

with open('test.txt') as infile:
    lines = infile.readlines()
    
months_of_year = [date(2021, i, 1).strftime('%B') for i in range(1, 13)] # Months of year
months = list(accumulate(lines, func = find_month, initial = ''))[1:]   # Create Month for each line
days = (find_day(line) for line in lines)                               # Day for each line

# sort lines based upon it's month and day
result = (x[-1] for x in sorted(zip(months, days, lines), key = lambda x: x[:2]))
    
with open('output.txt', 'w') as outfile:
    outfile.writelines(result)

网友

2楼 · 编辑于 2024-06-07 09:37:22

这个程序在Windows或Linux下运行，它们有一个排序程序。它通过读取输入文件的每一行，并将每行4个字符、2位数月数和2位数日数（对于月份之间的空白行使用“99”作为日数，以便它遵循每月的生日）来工作。然后，它将这些修改后的行传输到排序程序，并处理管道输出，以删除前4个字符，并就地重写文件，，这意味着您可能需要在运行此操作之前对文件进行备份，以防计算机在处理过程中中途宕机。修改代码以将输出写入单独的文件应该不会太困难

之所以使用这种技术，是因为没有对文件的大小做出任何假设，即给定月份可能有数百万个生日。只要排序程序可以处理输入，这个程序就可以

from subprocess import Popen, PIPE
import sys
import re

p = Popen('sort', stdin=PIPE, stdout=PIPE, shell=True, text=True)
month_no = 0
with open('test.txt', 'r+') as f:
    for line in f:
        if " birthdays:**" in line:
            month_no += 1
            p.stdin.write("%02d00" % month_no)
        else:
            m = re.match(r'\*\*(\d+)\*\*', line)
            if m:
                p.stdin.write("%02d%02d" % (month_no, int(m[1])))
            else:
                # blank line?
                p.stdin.write("%02d99" % month_no)
        p.stdin.write(line)
    p.stdin.close()
    f.seek(0, 0) # reposition back to beginning
    for line in p.stdout:
        f.write(line[4:]) # skip over
    f.truncate() # this really shouldn't be necesssary
p.wait()

网友

3楼 · 编辑于 2024-06-07 09:37:22

在这里使用collections.defaultdict非常方便，因此您无需进行任何检查，只需添加数据即可。基本上，您只需读取将当前月份保存在变量中的文件，然后检查是否在新月份，如果在新月份，则只需更新它，如果在日期，则获取日期并附加字符串。（这允许多个人有相同的生日

from collections import defaultdict

data = defaultdict(lambda: defaultdict(list))

with open('filename.txt') as infile:
    month = next(infile).strip()
    for line in infile:
        if not line.strip(): continue
        if line[2].isalpha():
            month = line.strip()
        else:
            data[month][int(line.split('**')[1])].append(line.strip())

这将根据您的示例将数据整齐地放入dict中，如下所示：

{'**January birthdays:**': {17: ['**17** - !@Mark'], 4: ['**4** - !@Jan'], 15: ['**15** - !@Ralph']},
 '**February birthdays:**': {27: ['**27** - !@Steve'], 19: ['**19** - !@Bill'], 29: ['**29** - !@Bob']}}

从这里开始，您只需在数据中进行循环，并在循环和写入文件时对日期进行排序

with open('filename.txt', 'w') as outfile:
    for month, days in data.items():
        outfile.write(month + '\n')
        for day in sorted(days):
            for day_text in days[day]:
                outfile.write(day_text + '\n')
        outfile.write('\n')

相关问题更多 >

编程相关推荐

热门问题

热门文章