如何比较一行中的字符串和下一行中的字符串？

3条回答

网友

1楼 · 编辑于 2024-05-12 20:32:16

由于您的数据在固定宽度记录中显示为固定宽度字段，因此可以使用struct模块将每一行快速分解为单独的字段。你知道吗

当您只需要处理其中一个字段时，解析每一行的所有字段可能会过于繁琐，但我所用的方法说明了在您需要进行其他处理时是如何完成的，并且使用struct模块在任何情况下都会相对快速。你知道吗

假设输入文件只包含以下数据行：

ATOM    139  C1  DPPC   18      17.250  58.420  10.850  1.00  0.00
ATOM    139  C1  DPPC   18      17.250  58.420  10.850  1.00  0.00
ATOM    139  C1  DPPC   18      17.250  58.420  10.850  1.00  0.00
ATOM    139  C1  DPPC   18      17.250  58.420  10.850  1.00  0.00
ATOM    139  C1  DPPC   18      17.250  58.420  10.850  1.00  0.00
ATOM    139  C1  DPPC   18      17.250  58.420  10.850  1.00  0.00
ATOM    139  C1  DPPC   18      17.250  58.420  10.850  1.00  0.00
ATOM    139  C1  DPPC   18      17.250  58.420  10.850  1.00  0.00
ATOM    139  C1  DPPC   18      17.250  58.420  10.850  1.00  0.00
ATOM    189  C1  DPPC   19      23.050  20.800  11.000  1.00  0.00
ATOM    189  C1  DPPC   19      23.050  20.800  11.000  1.00  0.00
ATOM    189  C1  DPPC   19      23.050  20.800  11.000  1.00  0.00
ATOM    189  C1  DPPC   19      23.050  20.800  11.000  1.00  0.00
ATOM    189  C1  DPPC   19      23.050  20.800  11.000  1.00  0.00
ATOM    189  C1  DPPC   19      23.050  20.800  11.000  1.00  0.00
ATOM    189  C1  DPPC   19      23.050  20.800  11.000  1.00  0.00
ATOM    189  C1  DPPC   20      23.050  20.800  11.000  1.00  0.00
ATOM    189  C1  DPPC   20      23.050  20.800  11.000  1.00  0.00
ATOM    189  C1  DPPC   20      23.050  20.800  11.000  1.00  0.00

您所需要做的就是记住前一行中字段的值，以便将其与当前字段进行比较。要开始这个过程，必须分别读取和解析第一行，因此有一个prev值要与后面的行进行比较。还要注意，第5个字段是由[4]索引的字段，因为第一个字段从[0]开始。你知道吗

import struct

# negative widths represent ignored padding fields
fieldwidths = 4, -4, 3, -2, 2, -2, 4, -3, 2, -6, 6, -2, 6, -2, 6, -2, 4, -2, 4
fmtstring = ' '.join('{}{}'.format(abs(fw), 'x' if fw < 0 else 's')
                                    for fw in fieldwidths)
fieldstruct = struct.Struct(fmtstring)
parse = fieldstruct.unpack_from  # a function to split line up into fields

with open('test_file.pdb') as f1:
    prev = parse(next(f1))[4]  # remember value of fifth field
    cnt = 1
    for line in f1:
        curr = parse(line)[4]  # get value of fifth field
        if curr == prev:  # same as last one?
            cnt += 1
        else:
            print('{} occurred {} times'.format(prev, cnt))
            prev = curr
            cnt = 1
    print('{} occurred {} times'.format(prev, cnt))  # for last line

输出：

18 occurred 9 times
19 occurred 7 times
20 occurred 3 times

网友

2楼 · 编辑于 2024-05-12 20:32:16

您还可以通过一个平行列表轻松解决此问题：

data = []
with open('data.txt', 'r') as datafile:
    for line in datafile:
        line=line.strip()
        if line:
            data.append(line);


keywordList = []
for line in data:
    line = line.split()
    if (line[4] not in keywordList):
        keywordList.append(line[4])


counterList = []
for item in keywordList:
    counter = 0
    for line in data:
        line = line.split()
        if (line[4] == item):
            counter+=1
    counterList.append(counter)


for i in range(len(keywordList)):
    print("%s: %d"%(keywordList[i],counterList[i]));

但如果你熟悉迪克特，我会同意卢茨的回答。你知道吗

网友

3楼 · 编辑于 2024-05-12 20:32:16

试试这个（注释中的解释）。你知道吗

data = """ATOM    139  C1  DPPC   18      17.250  58.420  10.850  1.00  0.00
ATOM    139  C1  DPPC   18      17.250  58.420  10.850  1.00  0.00
ATOM    139  C1  DPPC   18      17.250  58.420  10.850  1.00  0.00
ATOM    139  C1  DPPC   18      17.250  58.420  10.850  1.00  0.00
ATOM    139  C1  DPPC   18      17.250  58.420  10.850  1.00  0.00
ATOM    139  C1  DPPC   18      17.250  58.420  10.850  1.00  0.00
ATOM    139  C1  DPPC   18      17.250  58.420  10.850  1.00  0.00
ATOM    139  C1  DPPC   18      17.250  58.420  10.850  1.00  0.00
ATOM    139  C1  DPPC   18      17.250  58.420  10.850  1.00  0.00
ATOM    189  C1  DPPC   19      23.050  20.800  11.000  1.00  0.00
ATOM    189  C1  DPPC   19      23.050  20.800  11.000  1.00  0.00
ATOM    189  C1  DPPC   19      23.050  20.800  11.000  1.00  0.00
ATOM    189  C1  DPPC   19      23.050  20.800  11.000  1.00  0.00
ATOM    189  C1  DPPC   19      23.050  20.800  11.000  1.00  0.00
ATOM    189  C1  DPPC   19      23.050  20.800  11.000  1.00  0.00
ATOM    189  C1  DPPC   19      23.050  20.800  11.000  1.00  0.00
ATOM    189  C1  DPPC   19      23.050  20.800  11.000  1.00  0.00
ATOM    189  C1  DPPC   19      23.050  20.800  11.000  1.00  0.00
ATOM    189  C1  DPPC   19      23.050  20.800  11.000  1.00  0.00"""

# The last code seen in the 5th column.
code = None

# The count of lines of the current code.
count = 0

for line in data.split("\n"):
    # Get the 5th column.
    c = line.split()[4]

    # The code in the 5th column changed.
    if c != code:
        # If we aren't at the start of the file, print the count
        # for the code that just ended.
        if code:
            print("{}: {}".format(code, count))

        # Rember the new code.
        code = c

    # Count the line
    count = count + 1

# Print the count for the last code.
print("{}: {}".format(code, count))

输出：

18: 9
19: 19

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何比较一行中的字符串和下一行中的字符串？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >