使用Python每x行处理文件的一部分

2 投票
4 回答
1818 浏览
提问于 2025-04-16 21:51

我想做的是从一个叫DATA.txt的文件中,每隔y行读取z行,然后对这部分行执行一个叫find的函数。也就是说,我想跳过前y行;读取接下来的z行;对刚读取的这些行执行find函数;再跳过接下来的y行;然后重复这个过程,直到文件的末尾(文件名通过sys.argv[1]传入)。

我现在的代码让lines这个变量里充满了空行,我不太明白为什么会这样。如果需要的话,我可以提供find函数的代码,但我觉得这样更简单。

如果有人想提出完全不同的做法,我也很乐意,只要我能理解发生了什么就行。

编辑:我之前漏掉了一些括号,但加上去后问题并没有解决。

import sys
import operator
import linecache
def find(arg)
    ...
x=0
while x<int(sys.argv[1]):
   x+=1 
   if mod(x, y)==0:
       for i in range(x,x+z):
           block=linecache.getline('DATA.txt', i)
           g = open('tmp','a+')
           g.write(block)
           linecache.clearcache()
           lines=g.read()
           find(lines)
           g.close()
   else:
       pass
g.close()
f.close()

4 个回答

0

在“不同的方法”这一类中,我提供了这个(行号显然只是为了显示):

  1 """
  2 Reading line lines from DATA.txt, first skip 3 lines, then print 2 lines,
  3 then skip 3 more lines, etc.
  4 """
  5 
  6 def my_print(l):
  7     if (my_print.skip_counter > 0):
  8         my_print.skip_counter -= 1
  9     else:
 10         if (my_print.print_counter > 0):
 11             my_print.print_counter -= 1
 12             print l,
 13         else:
 14             my_print.skip_counter = my_print.skip_size
 15             my_print.print_counter = my_print.print_size
 16             my_print(l)
 17 
 18 my_print.skip_size = 3
 19 my_print.skip_counter = my_print.skip_size
 20 
 21 my_print.print_size = 2
 22 my_print.print_counter = my_print.print_size
 23 
 24 data = open('DATA.txt')
 25 for line in data:
 26     my_print(line)

改进这个的第一种方法是把 my_print() 放在一个类里面(把你的 x 和 y 作为类的成员变量)。如果你想要一些真正“符合 Python 风格”的东西,那你可以用生成器来让它更炫酷。

2

编辑: 尝试下面的内容,我觉得我现在对你想做的事情有了更好的理解。

g = open('tmp','a+')
while x<int(sys.argv[1]):
   x+=1 
   if mod(x, y)==0:
       curr = g.tell()
       for i in range(x,x+z):
           block=linecache.getline('DATA.txt', i)
           g.write(block)
           linecache.clearcache()
       g.seek(curr)
       lines = g.read()
       find(lines)
   else:
       pass
g.close()
1

Maimon,你的代码在索引方面是错的。而Andrew的代码也有问题,因为他是以你的代码为基础的。

看看我去掉了g相关行后的Andrew代码的结果:

import sys
import operator
import linecache

x=0
y=7  # to skip
z=3  # to print

#g = open('tmp','a+')
while x<23:
    x+=1
    print 'x==',x
    if operator.mod(x, y)==0:
        #curr = g.tell()
        for i in range(x,x+z):
            block=linecache.getline('poem.txt', i)
            print 'block==',repr(block)
            #g.write(block)
            linecache.clearcache()
            #g.seek(curr)
            #lines = g.read()
            #find(lines)

    else:
        pass

#g.close()

这个代码应用在一个名为'poem.txt'的文件上,这个文件有24行:

1 In such a night, when every louder wind
2 Is to its distant cavern safe confined;
3 And only gentle Zephyr fans his wings,
4 And lonely Philomel, still waking, sings;
5 Or from some tree, famed for the owl's delight,
6 She, hollowing clear, directs the wand'rer right:
7 In such a night, when passing clouds give place,
8 Or thinly veil the heav'ns' mysterious face;
9 When in some river, overhung with green,
10 The waving moon and trembling leaves are seen;
11 When freshened grass now bears itself upright,
12 And makes cool banks to pleasing rest invite,
13 Whence springs the woodbind, and the bramble-rose,
14 And where the sleepy cowslip sheltered grows;
15 Whilst now a paler hue the foxglove takes,
16 Yet checkers still with red the dusky brakes
17 When scattered glow-worms, but in twilight fine,
18 Shew trivial beauties watch their hour to shine;
19 Whilst Salisb'ry stands the test of every light,
20 In perfect charms, and perfect virtue bright:
21 When odors, which declined repelling day,
22 Through temp'rate air uninterrupted stray;
23 When darkened groves their softest shadows wear,
24 And falling waters we distinctly hear;

结果是:

x== 1
x== 2
x== 3
x== 4
x== 5
x== 6
x== 7
block== '7 In such a night, when passing clouds give place,\n'
block== "8 Or thinly veil the heav'ns' mysterious face;\n"
block== '9 When in some river, overhung with green,\n'
x== 8
x== 9
x== 10
x== 11
x== 12
x== 13
x== 14
block== '14 And where the sleepy cowslip sheltered grows;\n'
block== '15 Whilst now a paler hue the foxglove takes,\n'
block== '16 Yet checkers still with red the dusky brakes\n'
x== 15
x== 16
x== 17
x== 18
x== 19
x== 20
x== 21
block== '21 When odors, which declined repelling day,\n'
block== "22 Through temp'rate air uninterrupted stray;\n"
block== '23 When darkened groves their softest shadows wear,\n'
x== 22
x== 23
x== 24
x== 25

我选择了y=7来跳过7行,但第7行还是被打印出来了。

而且,计数在打印完第7、8、9行(我选择了z=3)后,接着是8、9、10...,而不是继续10、11、12...。接下来打印的3行是14、15、16,而应该是7行之后的3行,也就是11、12、13。

实际上,如果你想跳过7行,然后打印3行,打印出来的行应该是:
8-9-10
18-19-20
28-29-30
等等。

我说得对吗?

编辑 1

我的解决方案是:

def chunk_reading(filepath,y,z,x=0):
    # x : number of lines to skip before the periodical treatment
    # y : number of lines to periodically skip
    # z : number of lines to periodically print
    with open('poem.txt') as f:
        try:
            for sk in xrange(x):
                f.next()
            while True:
                try:
                    for i in xrange(y):
                        print 'i==',i
                        f.next()
                    for j in xrange(z):
                        print 'j==',j
                        print repr(f.next())
                except StopIteration:
                    break
        except StopIteration:
            print 'Not enough lines before the lines to print'


chunk_reading('poem.txt',7,3)

产生的结果是:

i== 0
i== 1
i== 2
i== 3
i== 4
i== 5
i== 6
j== 0
"8 Or thinly veil the heav'ns' mysterious face;\n"
j== 1
'9 When in some river, overhung with green,\n'
j== 2
'10 The waving moon and trembling leaves are seen;\n'
i== 0
i== 1
i== 2
i== 3
i== 4
i== 5
i== 6
j== 0
'18 Shew trivial beauties watch their hour to shine;\n'
j== 1
"19 Whilst Salisb'ry stands the test of every light,\n"
j== 2
'20 In perfect charms, and perfect virtue bright:\n'
i== 0
i== 1
i== 2
i== 3
i== 4

编辑 2

上面的解决方案即使对于无法完全放入内存的大文件也能使用。

下面的方案适用于大小有限的文件:

def slice_reading(filepath,y,z,x=0):
    # x : number of lines to skip before the periodical treatment
    # y : number of lines to periodically skip
    # z : number of lines to periodically print
    with open('poem.txt') as f:
        lines = f.readlines()
        lgth = len(lines)

    if lgth > x+y:
        for a in xrange(x+y,lgth,y+z):
            print lines[a:a+z]
    else:
        print 'Not enough lines before lines to print'


slice_reading('poem.txt',7,3,5)

结果

['13 Whence springs the woodbind, and the bramble-rose,\n', '14 And where the sleepy cowslip sheltered grows;\n', '15 Whilst now a paler hue the foxglove takes,\n']
['23 When darkened groves their softest shadows wear,\n', '24 And falling waters we distinctly hear;']

撰写回答