date的正则表达式在python的RE modu中不匹配

2024-05-17 20:01:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我目前正在编写一段代码,从观察点列表中提取数据(下面给出了一个示例)。我现在有一个正则表达式列表,可以删除任何不包含我要查找的数据的行。除了搜索日期的正则表达式之外,所有正则表达式都成功地指示了包含元数据的行。测试时regexr.com,表达式工作正常,但在运行代码时,无法删除行。删除包含日期的行缺少什么?你知道吗

数据示例

!   CD = 2 letter state (province) abbreviation
!   STATION = 16 character station long name
!   ICAO = 4-character international id
!   IATA = 3-character (FAA) id
!   SYNOP = 5-digit international synoptic number
!   LAT = Latitude (degrees minutes)
!   LON = Longitude (degree minutes)
!   ELEV = Station elevation (meters)
!   M = METAR reporting station.   Also Z=obsolete? site
!   N = NEXRAD (WSR-88D) Radar site
!   V = Aviation-specific flag (V=AIRMET/SIGMET end point, A=ARTCC T=TAF U=T+V)
!   U = Upper air (rawinsonde=X) or Wind Profiler (W) site
!   A = Auto (A=ASOS, W=AWOS, M=Meso, H=Human, G=Augmented) (H/G not yet impl.)
!   C = Office type F=WFO/R=RFC/C=NCEP Center
!   Digit that follows is a priority for plotting (0=highest)
!   Country code (2-char) is last column
!
!2345678901234567890123456789012345678901234567890123456789012345678901234567890                                                                                                  1234567890
!

ALASKA             16-DEC-13
CD  STATION         ICAO  IATA  SYNOP   LAT     LONG   ELEV   M  N  V  U  A  C
AK ADAK NAS         PADK  ADK   70454  51 53N  176 39W    4   X     T          7                                                                                                   US
AK AKHIOK           PAKH  AKK          56 56N  154 11W   14   X                8                                                                                                   US
AK AMBLER           PAFM  AFM          67 06N  157 51W   88   X                7                                                                                                   US
AK ANAKTUVUK PASS   PAKP  AKP          68 08N  151 44W  642   X                7                                                                                                   US
AK ANCHORAGE INTL   PANC  ANC   70273  61 10N  150 01W   38   X     T  X  A    5                                                                                                   US
AK ANCHORAGE/WFO    PAFC  AFC          61 10N  150 02W   48                  F 8                                                                                                   US
AK ANCHORAG/NIKISKI PAHG  AHG          60 44N  151 21W   74      X             8                                                                                                   US
AK ANCHORAGE/LAKE H PALH  LHD          61 11N  149 58W   22   X           A    7                                                                                                   US
AK ANCHORAGE/ARTCC  PZAN  ZAN          61 10N  149 59W   22         A          8                                                                                                   US
AK ANCHORAGE/MERRIL PAMR  MRI          61 13N  149 51W   41   X           A    7                                                                                                   US
AK ANGOON SEAPLANE  PAGN               57 30N  134 35W    2   X                8                                                                                                   US
AK ANIAK            PANI  ANI   70232  61 35N  159 32W   26   X                7                                                                                                   US
AK ANNETTE ISLAND   PANT  ANN   70398  55 02N  131 34W   36   X        X  A    5                                                                                                   US
AK ANVIK            PANV  ANV          62 39N  160 11W   99   X                7                                                                                                   US
AK ARCTIC VILLAGE   PARC  ARC          68 07N  145 35W  636   X                7                                                                                                   US
AK ATQASUK BURNELL  PATQ  ATK          70 28N  157 26W   29   X                7                                                                                                   US
AK ATKA             PAAK  AKA          52 13N  174 12W   17   X                7                                                                                                   US
AK BARROW           PABR  BRW   70026  71 17N  156 48W    7   X     T  X  A    5                                                                                                   US
AK BARROW ARM-NSA               70027  71 19N  156 37W    7            X       8                                                                                                   US
AK BARTER ISLAND    PABA  BTI   70086  70 08N  143 35W    2   X           W    7                                                                                                   US
AK BETHEL           PABE  BET   70219  60 47N  161 51W   41   X     T  X  A    5                                                                                                   US
AK BETHEL/88D       PABC  ABC          60 48N  161 53W   49      X             8                                                                                                   US
AK BETTLES          PABT  BTT   70174  66 55N  151 31W  195   X     T     A    6                                                                                                   US
AK BIG RIVER LAKES  PALV  LVR          60 49N  152 18W   12   X                7                                                                                                   US
AK BIRCHWOOD        PABV  BCV          61 25N  149 31W   29   X                7                                                                                                   US
AK BREVIG_MISSION   PFKT               65 20N  166 28W    9   X                7                                                                                                   US
AK BUCKLAND         PABL  BVK          65 59N  161 09W    7   X                7                                                                                                   US
AK CANTWELL         PATW  TTW          63 23N  148 57W  668   X                7                                                                                                   US
AK CAPE LISBURNE    PALU  LUR   70104  68 53N  166 08W    3   X     T     W    6                                                                                                   US
AK CAPE NEWENHAM    PAEH  EHM   70305  58 39N  162 04W  161   X     T          6                                                                                                   US
AK CAPE ROMANZOF    PACZ  CZF   70212  61 47N  166 02W  146   X     T          6                                                                                                   US
AK CENTRAL          PARL               65 34N  144 47W  284   X                7                                                                                                   US
AK CENTRAL          PACE               65 34N  144 47W  286   X                7                                                                                                   US
AK CENTRAL AK PROF        CEN   70197  65 30N  144 41W  259            W       8                                                                                                   US
AK CHANDALAR LAKE   PALR  WCR          67 30N  148 29W  585   X                7                                                                                                   US
AK CHEVAK           PAVA               61 32N  165 36W   23   X                7                                                                                                   US
AK CHIGNIK BAY      PAJC  AJC          56 19N  158 22W   15   X                7                                                                                                   US
AK CIRCLE/PAFC RFC  PACR  CRC          65 50N  144 04W  182   X              R 7                                                                                                   US
AK COLD BAY         PACD  CDB   70316  55 12N  162 43W   30   X     T  X  A    5                                                                                                   US
AK CORDOVA          PACV  CDV   70296  60 30N  145 30W   12   X     T     A    6                                                                                                   US
AK DEADHORSE        PASC  SCC          70 12N  148 28W   15   X     T     A    6                                                                                                   US
AK DEERING          PADE  DEE          66 04N  162 46W    5   X           A    7                                                                                                   US
AK DELTA JUNCTION   PABI  BIG   70267  64 00N  145 44W  386   X     T     A    6                                                                                                   US

我的代码

station_file = open('../DATA/stations.txt', 'r')
data = station_file.read()

skip_res = ['^$', '^.*d{2}\-[A-Z]{3}\-\d{2}','^!'] #List of regular expressions which only appear in lines of metadata (not actual data)

data = data.split('\n')

for loop in data:
    breakcheck = False # In the event a regular expression matches, this will turn to true and skip that line
    for check in skip_res:
        current = re.compile(check)
        if current.search(loop) == None:
            continue
        else:
            breakcheck = True
            break
    if breakcheck:
        continue
    else:
        print(loop) # Should only print out lines containing actual data.

Tags: 数据代码inloopfordatasiteus
2条回答

您的日期正则表达式在第一个“d”之前缺少反斜杠。你知道吗

'^.*d{2}\-[A-Z]{3}\-\d{2}'

应该是

'^.*\d{2}\-[A-Z]{3}\-\d{2}'

匹配日期的模式在第一个d之前缺少一个\。更改为:

r'\d{2}-[A-Z]{3}-\d{2}'

因为您使用的是re.search(),所以不需要从字符串的开头进行匹配。而且,您不需要逃避-。你知道吗

注意使用原始字符串(由r前缀表示)来指定模式。一般来说,您应该为regex模式使用原始字符串,因为有些字符串转义序列也是regex模式,例如\b。作为普通字符串,它表示退格字符。在原始字符串中,它被视为\,后跟b,这是“单词开头或结尾”的regex模式。你知道吗

另一件值得一提的事情是,通过将多个模式与|连接在一起,可以同时检查多个模式的匹配。把它想象成“或”。这样您的代码就可以写得更简洁:

skip_res = [r'^$', r'\d{2}-[A-Z]{3}-\d{2}',r'^!']
skip_pattern = r'|'.join(skip_res)

with open ('../DATA/stations.txt', 'r') as station_file:
    for line in station_file:
        if re.search(skip_pattern, line):
            continue
        print(line)

只有少数regex模式时,编译regex模式没有任何好处,因为re模块将缓存它们。你知道吗

相关问题 更多 >