使用Python每x行处理文件的一部分
我想做的是从一个叫DATA.txt的文件中,每隔y行读取z行,然后对这部分行执行一个叫find的函数。也就是说,我想跳过前y行;读取接下来的z行;对刚读取的这些行执行find函数;再跳过接下来的y行;然后重复这个过程,直到文件的末尾(文件名通过sys.argv[1]传入)。
我现在的代码让lines这个变量里充满了空行,我不太明白为什么会这样。如果需要的话,我可以提供find函数的代码,但我觉得这样更简单。
如果有人想提出完全不同的做法,我也很乐意,只要我能理解发生了什么就行。
编辑:我之前漏掉了一些括号,但加上去后问题并没有解决。
import sys
import operator
import linecache
def find(arg)
...
x=0
while x<int(sys.argv[1]):
x+=1
if mod(x, y)==0:
for i in range(x,x+z):
block=linecache.getline('DATA.txt', i)
g = open('tmp','a+')
g.write(block)
linecache.clearcache()
lines=g.read()
find(lines)
g.close()
else:
pass
g.close()
f.close()
4 个回答
在“不同的方法”这一类中,我提供了这个(行号显然只是为了显示):
1 """ 2 Reading line lines from DATA.txt, first skip 3 lines, then print 2 lines, 3 then skip 3 more lines, etc. 4 """ 5 6 def my_print(l): 7 if (my_print.skip_counter > 0): 8 my_print.skip_counter -= 1 9 else: 10 if (my_print.print_counter > 0): 11 my_print.print_counter -= 1 12 print l, 13 else: 14 my_print.skip_counter = my_print.skip_size 15 my_print.print_counter = my_print.print_size 16 my_print(l) 17 18 my_print.skip_size = 3 19 my_print.skip_counter = my_print.skip_size 20 21 my_print.print_size = 2 22 my_print.print_counter = my_print.print_size 23 24 data = open('DATA.txt') 25 for line in data: 26 my_print(line)
改进这个的第一种方法是把 my_print() 放在一个类里面(把你的 x 和 y 作为类的成员变量)。如果你想要一些真正“符合 Python 风格”的东西,那你可以用生成器来让它更炫酷。
编辑: 尝试下面的内容,我觉得我现在对你想做的事情有了更好的理解。
g = open('tmp','a+')
while x<int(sys.argv[1]):
x+=1
if mod(x, y)==0:
curr = g.tell()
for i in range(x,x+z):
block=linecache.getline('DATA.txt', i)
g.write(block)
linecache.clearcache()
g.seek(curr)
lines = g.read()
find(lines)
else:
pass
g.close()
Maimon,你的代码在索引方面是错的。而Andrew的代码也有问题,因为他是以你的代码为基础的。
看看我去掉了g相关行后的Andrew代码的结果:
import sys
import operator
import linecache
x=0
y=7 # to skip
z=3 # to print
#g = open('tmp','a+')
while x<23:
x+=1
print 'x==',x
if operator.mod(x, y)==0:
#curr = g.tell()
for i in range(x,x+z):
block=linecache.getline('poem.txt', i)
print 'block==',repr(block)
#g.write(block)
linecache.clearcache()
#g.seek(curr)
#lines = g.read()
#find(lines)
else:
pass
#g.close()
这个代码应用在一个名为'poem.txt'的文件上,这个文件有24行:
1 In such a night, when every louder wind
2 Is to its distant cavern safe confined;
3 And only gentle Zephyr fans his wings,
4 And lonely Philomel, still waking, sings;
5 Or from some tree, famed for the owl's delight,
6 She, hollowing clear, directs the wand'rer right:
7 In such a night, when passing clouds give place,
8 Or thinly veil the heav'ns' mysterious face;
9 When in some river, overhung with green,
10 The waving moon and trembling leaves are seen;
11 When freshened grass now bears itself upright,
12 And makes cool banks to pleasing rest invite,
13 Whence springs the woodbind, and the bramble-rose,
14 And where the sleepy cowslip sheltered grows;
15 Whilst now a paler hue the foxglove takes,
16 Yet checkers still with red the dusky brakes
17 When scattered glow-worms, but in twilight fine,
18 Shew trivial beauties watch their hour to shine;
19 Whilst Salisb'ry stands the test of every light,
20 In perfect charms, and perfect virtue bright:
21 When odors, which declined repelling day,
22 Through temp'rate air uninterrupted stray;
23 When darkened groves their softest shadows wear,
24 And falling waters we distinctly hear;
结果是:
x== 1
x== 2
x== 3
x== 4
x== 5
x== 6
x== 7
block== '7 In such a night, when passing clouds give place,\n'
block== "8 Or thinly veil the heav'ns' mysterious face;\n"
block== '9 When in some river, overhung with green,\n'
x== 8
x== 9
x== 10
x== 11
x== 12
x== 13
x== 14
block== '14 And where the sleepy cowslip sheltered grows;\n'
block== '15 Whilst now a paler hue the foxglove takes,\n'
block== '16 Yet checkers still with red the dusky brakes\n'
x== 15
x== 16
x== 17
x== 18
x== 19
x== 20
x== 21
block== '21 When odors, which declined repelling day,\n'
block== "22 Through temp'rate air uninterrupted stray;\n"
block== '23 When darkened groves their softest shadows wear,\n'
x== 22
x== 23
x== 24
x== 25
我选择了y=7来跳过7行,但第7行还是被打印出来了。
而且,计数在打印完第7、8、9行(我选择了z=3)后,接着是8、9、10...,而不是继续10、11、12...。接下来打印的3行是14、15、16,而应该是7行之后的3行,也就是11、12、13。
实际上,如果你想跳过7行,然后打印3行,打印出来的行应该是:
8-9-10
18-19-20
28-29-30
等等。
我说得对吗?
编辑 1
我的解决方案是:
def chunk_reading(filepath,y,z,x=0):
# x : number of lines to skip before the periodical treatment
# y : number of lines to periodically skip
# z : number of lines to periodically print
with open('poem.txt') as f:
try:
for sk in xrange(x):
f.next()
while True:
try:
for i in xrange(y):
print 'i==',i
f.next()
for j in xrange(z):
print 'j==',j
print repr(f.next())
except StopIteration:
break
except StopIteration:
print 'Not enough lines before the lines to print'
chunk_reading('poem.txt',7,3)
产生的结果是:
i== 0
i== 1
i== 2
i== 3
i== 4
i== 5
i== 6
j== 0
"8 Or thinly veil the heav'ns' mysterious face;\n"
j== 1
'9 When in some river, overhung with green,\n'
j== 2
'10 The waving moon and trembling leaves are seen;\n'
i== 0
i== 1
i== 2
i== 3
i== 4
i== 5
i== 6
j== 0
'18 Shew trivial beauties watch their hour to shine;\n'
j== 1
"19 Whilst Salisb'ry stands the test of every light,\n"
j== 2
'20 In perfect charms, and perfect virtue bright:\n'
i== 0
i== 1
i== 2
i== 3
i== 4
编辑 2
上面的解决方案即使对于无法完全放入内存的大文件也能使用。
下面的方案适用于大小有限的文件:
def slice_reading(filepath,y,z,x=0):
# x : number of lines to skip before the periodical treatment
# y : number of lines to periodically skip
# z : number of lines to periodically print
with open('poem.txt') as f:
lines = f.readlines()
lgth = len(lines)
if lgth > x+y:
for a in xrange(x+y,lgth,y+z):
print lines[a:a+z]
else:
print 'Not enough lines before lines to print'
slice_reading('poem.txt',7,3,5)
结果
['13 Whence springs the woodbind, and the bramble-rose,\n', '14 And where the sleepy cowslip sheltered grows;\n', '15 Whilst now a paler hue the foxglove takes,\n']
['23 When darkened groves their softest shadows wear,\n', '24 And falling waters we distinctly hear;']