使用Python每x行处理文件的一部分

2 投票

4 回答

1818 浏览

提问于 2025-04-16 21:51

我想做的是从一个叫DATA.txt的文件中，每隔y行读取z行，然后对这部分行执行一个叫find的函数。也就是说，我想跳过前y行；读取接下来的z行；对刚读取的这些行执行find函数；再跳过接下来的y行；然后重复这个过程，直到文件的末尾（文件名通过sys.argv[1]传入）。

我现在的代码让lines这个变量里充满了空行，我不太明白为什么会这样。如果需要的话，我可以提供find函数的代码，但我觉得这样更简单。

如果有人想提出完全不同的做法，我也很乐意，只要我能理解发生了什么就行。

编辑：我之前漏掉了一些括号，但加上去后问题并没有解决。

import sys
import operator
import linecache
def find(arg)
    ...
x=0
while x<int(sys.argv[1]):
   x+=1 
   if mod(x, y)==0:
       for i in range(x,x+z):
           block=linecache.getline('DATA.txt', i)
           g = open('tmp','a+')
           g.write(block)
           linecache.clearcache()
           lines=g.read()
           find(lines)
           g.close()
   else:
       pass
g.close()
f.close()

函数调用数据处理文件处理文本解析行读取参数传递循环结构空行问题

4 个回答

在“不同的方法”这一类中，我提供了这个（行号显然只是为了显示）：

  1 """
  2 Reading line lines from DATA.txt, first skip 3 lines, then print 2 lines,
  3 then skip 3 more lines, etc.
  4 """
  5 
  6 def my_print(l):
  7     if (my_print.skip_counter > 0):
  8         my_print.skip_counter -= 1
  9     else:
 10         if (my_print.print_counter > 0):
 11             my_print.print_counter -= 1
 12             print l,
 13         else:
 14             my_print.skip_counter = my_print.skip_size
 15             my_print.print_counter = my_print.print_size
 16             my_print(l)
 17 
 18 my_print.skip_size = 3
 19 my_print.skip_counter = my_print.skip_size
 20 
 21 my_print.print_size = 2
 22 my_print.print_counter = my_print.print_size
 23 
 24 data = open('DATA.txt')
 25 for line in data:
 26     my_print(line)

改进这个的第一种方法是把 my_print() 放在一个类里面（把你的 x 和 y 作为类的成员变量）。如果你想要一些真正“符合 Python 风格”的东西，那你可以用生成器来让它更炫酷。

回答于 2025-04-16 由 Python大师

分享举报

编辑: 尝试下面的内容，我觉得我现在对你想做的事情有了更好的理解。

g = open('tmp','a+')
while x<int(sys.argv[1]):
   x+=1 
   if mod(x, y)==0:
       curr = g.tell()
       for i in range(x,x+z):
           block=linecache.getline('DATA.txt', i)
           g.write(block)
           linecache.clearcache()
       g.seek(curr)
       lines = g.read()
       find(lines)
   else:
       pass
g.close()

回答于 2025-04-16 由 Python大师

分享举报

Maimon，你的代码在索引方面是错的。而Andrew的代码也有问题，因为他是以你的代码为基础的。

看看我去掉了g相关行后的Andrew代码的结果：

import sys
import operator
import linecache

x=0
y=7  # to skip
z=3  # to print

#g = open('tmp','a+')
while x<23:
    x+=1
    print 'x==',x
    if operator.mod(x, y)==0:
        #curr = g.tell()
        for i in range(x,x+z):
            block=linecache.getline('poem.txt', i)
            print 'block==',repr(block)
            #g.write(block)
            linecache.clearcache()
            #g.seek(curr)
            #lines = g.read()
            #find(lines)

    else:
        pass

#g.close()

这个代码应用在一个名为'poem.txt'的文件上，这个文件有24行：

1 In such a night, when every louder wind
2 Is to its distant cavern safe confined;
3 And only gentle Zephyr fans his wings,
4 And lonely Philomel, still waking, sings;
5 Or from some tree, famed for the owl's delight,
6 She, hollowing clear, directs the wand'rer right:
7 In such a night, when passing clouds give place,
8 Or thinly veil the heav'ns' mysterious face;
9 When in some river, overhung with green,
10 The waving moon and trembling leaves are seen;
11 When freshened grass now bears itself upright,
12 And makes cool banks to pleasing rest invite,
13 Whence springs the woodbind, and the bramble-rose,
14 And where the sleepy cowslip sheltered grows;
15 Whilst now a paler hue the foxglove takes,
16 Yet checkers still with red the dusky brakes
17 When scattered glow-worms, but in twilight fine,
18 Shew trivial beauties watch their hour to shine;
19 Whilst Salisb'ry stands the test of every light,
20 In perfect charms, and perfect virtue bright:
21 When odors, which declined repelling day,
22 Through temp'rate air uninterrupted stray;
23 When darkened groves their softest shadows wear,
24 And falling waters we distinctly hear;

结果是：

x== 1
x== 2
x== 3
x== 4
x== 5
x== 6
x== 7
block== '7 In such a night, when passing clouds give place,\n'
block== "8 Or thinly veil the heav'ns' mysterious face;\n"
block== '9 When in some river, overhung with green,\n'
x== 8
x== 9
x== 10
x== 11
x== 12
x== 13
x== 14
block== '14 And where the sleepy cowslip sheltered grows;\n'
block== '15 Whilst now a paler hue the foxglove takes,\n'
block== '16 Yet checkers still with red the dusky brakes\n'
x== 15
x== 16
x== 17
x== 18
x== 19
x== 20
x== 21
block== '21 When odors, which declined repelling day,\n'
block== "22 Through temp'rate air uninterrupted stray;\n"
block== '23 When darkened groves their softest shadows wear,\n'
x== 22
x== 23
x== 24
x== 25

我选择了y=7来跳过7行，但第7行还是被打印出来了。

而且，计数在打印完第7、8、9行（我选择了z=3）后，接着是8、9、10...，而不是继续10、11、12...。接下来打印的3行是14、15、16，而应该是7行之后的3行，也就是11、12、13。

实际上，如果你想跳过7行，然后打印3行，打印出来的行应该是：
8-9-10
18-19-20
28-29-30
等等。

我说得对吗？

编辑 1

我的解决方案是：

def chunk_reading(filepath,y,z,x=0):
    # x : number of lines to skip before the periodical treatment
    # y : number of lines to periodically skip
    # z : number of lines to periodically print
    with open('poem.txt') as f:
        try:
            for sk in xrange(x):
                f.next()
            while True:
                try:
                    for i in xrange(y):
                        print 'i==',i
                        f.next()
                    for j in xrange(z):
                        print 'j==',j
                        print repr(f.next())
                except StopIteration:
                    break
        except StopIteration:
            print 'Not enough lines before the lines to print'


chunk_reading('poem.txt',7,3)

产生的结果是：

i== 0
i== 1
i== 2
i== 3
i== 4
i== 5
i== 6
j== 0
"8 Or thinly veil the heav'ns' mysterious face;\n"
j== 1
'9 When in some river, overhung with green,\n'
j== 2
'10 The waving moon and trembling leaves are seen;\n'
i== 0
i== 1
i== 2
i== 3
i== 4
i== 5
i== 6
j== 0
'18 Shew trivial beauties watch their hour to shine;\n'
j== 1
"19 Whilst Salisb'ry stands the test of every light,\n"
j== 2
'20 In perfect charms, and perfect virtue bright:\n'
i== 0
i== 1
i== 2
i== 3
i== 4

编辑 2

上面的解决方案即使对于无法完全放入内存的大文件也能使用。

下面的方案适用于大小有限的文件：

def slice_reading(filepath,y,z,x=0):
    # x : number of lines to skip before the periodical treatment
    # y : number of lines to periodically skip
    # z : number of lines to periodically print
    with open('poem.txt') as f:
        lines = f.readlines()
        lgth = len(lines)

    if lgth > x+y:
        for a in xrange(x+y,lgth,y+z):
            print lines[a:a+z]
    else:
        print 'Not enough lines before lines to print'


slice_reading('poem.txt',7,3,5)

结果

['13 Whence springs the woodbind, and the bramble-rose,\n', '14 And where the sleepy cowslip sheltered grows;\n', '15 Whilst now a paler hue the foxglove takes,\n']
['23 When darkened groves their softest shadows wear,\n', '24 And falling waters we distinctly hear;']

回答于 2025-04-16 由 Python大师

分享举报

使用Python每x行处理文件的一部分

4 个回答

编辑 1

编辑 2

撰写回答