如何在Python中暂时保留变量值并进行比较...

0 投票
4 回答
547 浏览
提问于 2025-04-17 22:14

大家好,我确定是因为缩进错误搞坏了逻辑,但现在我不知道怎么修复。
# # analyzeNano.py - 用来分析XYZ文件的“合理性” #

import csv
import sys
import os
import getopt

def main():
    '''
analyzeNano.py -d input-directory

analyzeNano.py analyzes a list of XYZ files inside input-directory. It counts for the number of consequitive DNA samples with identical ID and if it between 96 and 110 it treats it as 'good', otherwise 'bad'.
    input-directory    an input directory where XYZ files are located
    -d    flag for input-directory
At the end it creates 2 files: goodNano.csv and badNano.csv
Note: files that are not in goodNano.csv and badNano.csv have no DNA ID and therefore not listed
'''
    try:
        opts, args = getopt.getopt(sys.argv[1:],'d:')
    except getopt.GetoptError, err:
        print str(err)
        help(main)
        sys.exit(2)

    if len(opts) != 1:
        help(main)
        sys.exit(2)

    if not os.path.isdir( sys.argv[2] ):
        print "Error, ", sys.argv[2], " is not a valid directory"
        help(main)
        sys.exit(2)


    prefix = 'dna'
    goodFiles = []
    badFiles = []

    fileList = os.listdir(sys.argv[2])
    for f in fileList:
        absFile = os.path.join(os.path.abspath(sys.argv[2]), f )
        with open(absFile, 'rb') as csvfile:
            # use csv to separate the fields, making it easier to deal with the
            # first value without hard-coding its size
            reader = csv.reader(csvfile, delimiter='\t')
            match = None
            count = 0

            for row in reader:
                # matching rows
                if row[0].lower().startswith(prefix):

                    if match is None:
                        # first line with prefix..
                        match = row[0]

                    if row[0] == match:
                        # found a match, so increment
                        count += 1

                    if row[0] != match:
                        # row prefix has changed
                        if 96 <= count < 110:
                            # counted enough, so start counting the next
                            match = row[0] # match on this now
                            count = 0 # reset the count
                            goodFiles.append(csvfile.name)
                        else:
                            # didn't count enough, so stop working through this file
                            badFiles.append(csvfile.name)
                            break

                # non-matching rows
                else:
                    if match is None:
                        # ignore preceding lines in file
                        continue
                    else:
                        # found non-matching line when expecting a match
                        break
    else:
        if not 96 <= count < 110:
                    #there was at least successful run of lines
            goodFiles.remove(csvfile.name)

    # Create output files
    createFile(goodFiles, 'goodNano')
    createFile(badFiles, 'badNano')

def createFile(files, fName):
    fileName = open( fName + ".csv", "w" )
    for f in files:
        fileName.write( os.path.basename(f) )
        fileName.write("\n")


if __name__ == '__main__':
    main()

有没有人能帮我看看,告诉我哪里出错了?

4 个回答

0

请忽略我之前请求你们帮我检查代码的事情。我自己检查了一下,发现问题出在格式上。现在看起来代码运行正常,可以分析目录中的所有文件。再次感谢Metthew,他的帮助真是太大了。我还是对计算的准确性有些担心,因为在少数情况下它出现了错误,但我会继续调查这个问题。总的来说,非常感谢大家的巨大帮助。

0

根据你的描述,你感兴趣的那些行符合这个正则表达式:

^DNA[0-9]{10}

也就是说,我假设你的 xyz 实际上是十个 数字

这里的策略是匹配一个13个字符的字符串。如果没有匹配到,并且之前也没有匹配过,我们就继续往下走,不再停留。一旦匹配成功,我们就保存这个字符串,并把计数器加一。只要我们继续匹配这个正则表达式和保存的字符串,就一直加一。一旦遇到不同的正则匹配,或者根本没有匹配,连续的相同匹配就结束了。如果这个匹配是有效的,我们就把计数重置为零,把最后的匹配清空。如果无效,我们就退出。

我得赶紧补充一下,以下内容是未经测试的

# Input file with DNA lines to match:
infile = "z:/file.txt"

# This is the regex for the lines of interest:
regex = re.compile('^DNA[0-9]{10}')

# This will keep count of the number of matches in sequence:
n_seq = 0

# This is the previous match (if any):
lastmatch = ''

# Subroutine to check given sequence count and bail if bad:
def bail_on_bad_sequence(count, match):
    if 96 <= count < 100:
        return
    sys.stderr.write("Bad count (%d) for '%s'\n" % (count,match))
    sys.exit(1)


with open(infile) as file:
    for line in file:
        # Try to match the line to the regex:
        match = re.match(line)

        if match:
            if match.group(0) == lastmatch:
                n_seq += 1
            else:
                bail_on_bad_sequence(lastmatch, n_seq)
                n_seq = 0
                lastmatch = match.group(0)
        else:
            if n_seq != 0:
                bail_on_bad_sequence(lastmatch, n_seq)
                n_seq = 0
                lastmatch = ''
0

所有的变量都存储在内存中。你想要保存最近一次匹配的结果,并进行比较,同时计算匹配的次数:

import csv

prefix = 'DNA'

with open('file.txt','rb') as csvfile:
    # use csv to separate the fields, making it easier to deal with the
    # first value without hard-coding its size
    reader = csv.reader(csvfile, delimiter='\t')
    match = None
    count = 0
    is_good = False
    for row in reader:
        # matching rows
        if row[0].startswith(prefix):

            if match is None:
                # first line with prefix..
                match = row[0]

            if row[0] == match:
                # found a match, so increment
                count += 1

            if row[0] != match:
                # row prefix has changed
                if 96 <= count < 100:
                    # counted enough, so start counting the next
                    match = row[0] # match on this now
                    count = 0 # reset the count
                else:
                    # didn't count enough, so stop working through this file
                    break

        # non-matching rows
        else:
            if match is None:
                # ignore preceding lines in file
                continue
            else:
                # found non-matching line when expecting a match
                break
    else:
        if 96 <= count < 100:
            # there was at least successful run of lines
            is_good = True

if is_good:
    print 'File was good'
else:
    print 'File was bad'
0

这是我会重新调整你样式的方法:

with open("z:/file.txt", "rU") as file: # U flag means Universal Newline Mode, 
                                        # if error, try switching back to b
    print(file.name)        
    counter = 0
    for line in file: # iterate over a file object itself line by line
        if line.lower().startswith('dna'): # look for your desired condition
            # process the data
            counter += 1

撰写回答