如何在Python中暂时保留变量值并进行比较...
大家好,我确定是因为缩进错误搞坏了逻辑,但现在我不知道怎么修复。
#
# analyzeNano.py - 用来分析XYZ文件的“合理性”
#
import csv
import sys
import os
import getopt
def main():
'''
analyzeNano.py -d input-directory
analyzeNano.py analyzes a list of XYZ files inside input-directory. It counts for the number of consequitive DNA samples with identical ID and if it between 96 and 110 it treats it as 'good', otherwise 'bad'.
input-directory an input directory where XYZ files are located
-d flag for input-directory
At the end it creates 2 files: goodNano.csv and badNano.csv
Note: files that are not in goodNano.csv and badNano.csv have no DNA ID and therefore not listed
'''
try:
opts, args = getopt.getopt(sys.argv[1:],'d:')
except getopt.GetoptError, err:
print str(err)
help(main)
sys.exit(2)
if len(opts) != 1:
help(main)
sys.exit(2)
if not os.path.isdir( sys.argv[2] ):
print "Error, ", sys.argv[2], " is not a valid directory"
help(main)
sys.exit(2)
prefix = 'dna'
goodFiles = []
badFiles = []
fileList = os.listdir(sys.argv[2])
for f in fileList:
absFile = os.path.join(os.path.abspath(sys.argv[2]), f )
with open(absFile, 'rb') as csvfile:
# use csv to separate the fields, making it easier to deal with the
# first value without hard-coding its size
reader = csv.reader(csvfile, delimiter='\t')
match = None
count = 0
for row in reader:
# matching rows
if row[0].lower().startswith(prefix):
if match is None:
# first line with prefix..
match = row[0]
if row[0] == match:
# found a match, so increment
count += 1
if row[0] != match:
# row prefix has changed
if 96 <= count < 110:
# counted enough, so start counting the next
match = row[0] # match on this now
count = 0 # reset the count
goodFiles.append(csvfile.name)
else:
# didn't count enough, so stop working through this file
badFiles.append(csvfile.name)
break
# non-matching rows
else:
if match is None:
# ignore preceding lines in file
continue
else:
# found non-matching line when expecting a match
break
else:
if not 96 <= count < 110:
#there was at least successful run of lines
goodFiles.remove(csvfile.name)
# Create output files
createFile(goodFiles, 'goodNano')
createFile(badFiles, 'badNano')
def createFile(files, fName):
fileName = open( fName + ".csv", "w" )
for f in files:
fileName.write( os.path.basename(f) )
fileName.write("\n")
if __name__ == '__main__':
main()
有没有人能帮我看看,告诉我哪里出错了?
4 个回答
0
请忽略我之前请求你们帮我检查代码的事情。我自己检查了一下,发现问题出在格式上。现在看起来代码运行正常,可以分析目录中的所有文件。再次感谢Metthew,他的帮助真是太大了。我还是对计算的准确性有些担心,因为在少数情况下它出现了错误,但我会继续调查这个问题。总的来说,非常感谢大家的巨大帮助。
0
根据你的描述,你感兴趣的那些行符合这个正则表达式:
^DNA[0-9]{10}
也就是说,我假设你的 xyz 实际上是十个 数字。
这里的策略是匹配一个13个字符的字符串。如果没有匹配到,并且之前也没有匹配过,我们就继续往下走,不再停留。一旦匹配成功,我们就保存这个字符串,并把计数器加一。只要我们继续匹配这个正则表达式和保存的字符串,就一直加一。一旦遇到不同的正则匹配,或者根本没有匹配,连续的相同匹配就结束了。如果这个匹配是有效的,我们就把计数重置为零,把最后的匹配清空。如果无效,我们就退出。
我得赶紧补充一下,以下内容是未经测试的。
# Input file with DNA lines to match:
infile = "z:/file.txt"
# This is the regex for the lines of interest:
regex = re.compile('^DNA[0-9]{10}')
# This will keep count of the number of matches in sequence:
n_seq = 0
# This is the previous match (if any):
lastmatch = ''
# Subroutine to check given sequence count and bail if bad:
def bail_on_bad_sequence(count, match):
if 96 <= count < 100:
return
sys.stderr.write("Bad count (%d) for '%s'\n" % (count,match))
sys.exit(1)
with open(infile) as file:
for line in file:
# Try to match the line to the regex:
match = re.match(line)
if match:
if match.group(0) == lastmatch:
n_seq += 1
else:
bail_on_bad_sequence(lastmatch, n_seq)
n_seq = 0
lastmatch = match.group(0)
else:
if n_seq != 0:
bail_on_bad_sequence(lastmatch, n_seq)
n_seq = 0
lastmatch = ''
0
所有的变量都存储在内存中。你想要保存最近一次匹配的结果,并进行比较,同时计算匹配的次数:
import csv
prefix = 'DNA'
with open('file.txt','rb') as csvfile:
# use csv to separate the fields, making it easier to deal with the
# first value without hard-coding its size
reader = csv.reader(csvfile, delimiter='\t')
match = None
count = 0
is_good = False
for row in reader:
# matching rows
if row[0].startswith(prefix):
if match is None:
# first line with prefix..
match = row[0]
if row[0] == match:
# found a match, so increment
count += 1
if row[0] != match:
# row prefix has changed
if 96 <= count < 100:
# counted enough, so start counting the next
match = row[0] # match on this now
count = 0 # reset the count
else:
# didn't count enough, so stop working through this file
break
# non-matching rows
else:
if match is None:
# ignore preceding lines in file
continue
else:
# found non-matching line when expecting a match
break
else:
if 96 <= count < 100:
# there was at least successful run of lines
is_good = True
if is_good:
print 'File was good'
else:
print 'File was bad'
0
这是我会重新调整你样式的方法:
with open("z:/file.txt", "rU") as file: # U flag means Universal Newline Mode,
# if error, try switching back to b
print(file.name)
counter = 0
for line in file: # iterate over a file object itself line by line
if line.lower().startswith('dna'): # look for your desired condition
# process the data
counter += 1