在python中，如何从文本文件的多行中提取两个特定的数字

08:59:07.603 08:59:05.798816 PAL_PARR_INTF TraceModule GET int@HISR :82 drv_Shm.c (../../../PALCommon/Platform_EV/HAL/Common/driver/Shm/src) 525 08:59:07.603 08:59:05.798816 PAL_PARR_INTF TraceModule xdma is not running drv_Shm.c (../../../PALCommon/Platform_EV/HAL/Common/driver/Shm/src) 316 08:59:07.603 08:59:05.798847 PAL_PARR_INTF TraceModule DMA is activated drv_Shm.c (../../../PALCommon/Platform_EV/HAL/Common/driver/Shm/src) 461 08:59:10.847 08:59:09.588001 UHAL_SRCH TraceFlow : LAT #1 MEAS=-80[deg], LAT #2 MEAS=-110[deg] uhal_CHmcpPschMultiPath.c (../../../HEDGE/UL1/UHAL_3XX/Searcher/Code/Src) 1596 08:59:11.440 08:59:10.876819 UHAL_COMMON TraceWarning cellRtgSlot=0 cellRtgChip=1500 CELLK_ACTIVE=1 boundary RSN 232482 current RSN 232482 boundarySFN 508 currentSFN 508 uhal_Hmcp.c (../../../HEDGE/UL1/UHAL_3XX/platform/Code/Src) 2224 08:59:11.440 08:59:10.877277 UHAL_SRCH TraceWarning uhal_HmcpSearcherS1LISR: status_reg(0xf0100000) uhal_CHmcpPschMultiPath.c (../../../HEDGE/UL1/UHAL_3XX/Searcher/Code/Src) 1497 08:59:11.440 08:59:10.877307 UHAL_COMMON TraceWarning uhal_HmcpSearcherSCDLISR is called. uhal_CHmcpPschMultiPath.c (../../../HEDGE/UL1/UHAL_3XX/Searcher/Code/Src) 1512 08:59:11.440 08:59:10.877338 UHAL_SRCH TraceFlow : LAT #1 MEAS=-78[deg], LAT #2 MEAS=-110[deg] uhal_CHmcpPschMultiPath.c (../../../HEDGE/UL1/UHAL_3XX/Searcher/Code/Src) 1596

import re # Importing 're' for using regular expressions file_dir=raw_input('Enter the complete Directory of the file (eg c:\\abc.txt):') # Providing the user with a choice to open their file in .txt format with open(file_dir, 'r') as f: lat_lines= f.read() # storing the data in a variable # Declaring the two lists to hold the numbers raw_lat1 = [] raw_lat2 = [] start_1 = 'LAT #1 MEAS=' end_1 = '[de' start_2 = 'LAT #2 MEAS=' end_2 = '[de' x = re.findall(r'start_1(.*?)end_1',lat_lines,re.DOTALL) raw_lat1.append(x) y = re.findall(r'start_2(.*?)end_2',lat_lines,re.DOTALL) raw_lat2.append(y)

2条回答

网友

1楼 · 编辑于 2024-05-19 02:50:36

这应该可以做到（它不使用正则表达式，但仍然可以工作）

answer = []
with open('file.txt') as infile:
    for line in infile:
        if "LAT #1 MEAS=" not in line: continue
        if "LAT #2 MEAS=" not in line: continue
        splits = line.split('=')
        temp = [0,0]
        for i,part in enumerate(splits):
            if part.endswith("LAT #1 MEAS"): temp[0] = int(splits[i+1].split(None,1)[0].split('[',1)[0])
            elif part.endswith("LAT #2 MEAS"): temp[1] = int(splits[i+1].split(None,1)[0].split('[',1)[0])
        answer.append(temp)

网友

2楼 · 编辑于 2024-05-19 02:50:36

从这里我可以看到正则表达式有几个问题。在re.findall调用中，您使用start_1和end_2就好像它们是变量一样，但是正则表达式实际上只是将它们当作原始字符"start_1"和"end_1"等。要使用正则表达式字符串中的变量，必须使用格式字符串。示例：

r'%s(.*?)%s' % (start_1, end_1)

另外，当您使用.*end_1时，它将匹配任何字符，因此它将匹配所有字符，直到行上最后出现end_1。LAT #1和LAT #2都以相同的方式结束，因此如果字符串的其他所有内容都是正确的，那么它实际上会匹配“-80[deg]，LAT#2 MEAS=-110[de”

此外，在正则表达式中使用方括号时，必须对其进行转义。文字括号用于指定正则表达式中的字符集。你知道吗

下面是一个示例，我假设变量line包含示例字符串"12:34:56.789 78:90:12.123123123 BLAH_BLAH blahblah : LAT #1 MEAS=-80[deg], LAT #2 MEAS=-110[deg] blah_BlHaBKBjFkjsa.c"。您可能需要为整个文件调整此代码段。你知道吗

prefix = r'LAT %s MEAS=(-?\d+)\[deg\]' # includes format string for the variable part of the expression.
p1 = r'#1'
p2 = r'#2
x = re.findall(prefix % p1, line)
y = re.findall(prefix % p2, line)

相关问题更多 >

编程相关推荐

热门问题

热门文章