我目前正在编写一段代码,从观察点列表中提取数据(下面给出了一个示例)。我现在有一个正则表达式列表,可以删除任何不包含我要查找的数据的行。除了搜索日期的正则表达式之外,所有正则表达式都成功地指示了包含元数据的行。测试时regexr.com,表达式工作正常,但在运行代码时,无法删除行。删除包含日期的行缺少什么?你知道吗
数据示例
! CD = 2 letter state (province) abbreviation
! STATION = 16 character station long name
! ICAO = 4-character international id
! IATA = 3-character (FAA) id
! SYNOP = 5-digit international synoptic number
! LAT = Latitude (degrees minutes)
! LON = Longitude (degree minutes)
! ELEV = Station elevation (meters)
! M = METAR reporting station. Also Z=obsolete? site
! N = NEXRAD (WSR-88D) Radar site
! V = Aviation-specific flag (V=AIRMET/SIGMET end point, A=ARTCC T=TAF U=T+V)
! U = Upper air (rawinsonde=X) or Wind Profiler (W) site
! A = Auto (A=ASOS, W=AWOS, M=Meso, H=Human, G=Augmented) (H/G not yet impl.)
! C = Office type F=WFO/R=RFC/C=NCEP Center
! Digit that follows is a priority for plotting (0=highest)
! Country code (2-char) is last column
!
!2345678901234567890123456789012345678901234567890123456789012345678901234567890 1234567890
!
ALASKA 16-DEC-13
CD STATION ICAO IATA SYNOP LAT LONG ELEV M N V U A C
AK ADAK NAS PADK ADK 70454 51 53N 176 39W 4 X T 7 US
AK AKHIOK PAKH AKK 56 56N 154 11W 14 X 8 US
AK AMBLER PAFM AFM 67 06N 157 51W 88 X 7 US
AK ANAKTUVUK PASS PAKP AKP 68 08N 151 44W 642 X 7 US
AK ANCHORAGE INTL PANC ANC 70273 61 10N 150 01W 38 X T X A 5 US
AK ANCHORAGE/WFO PAFC AFC 61 10N 150 02W 48 F 8 US
AK ANCHORAG/NIKISKI PAHG AHG 60 44N 151 21W 74 X 8 US
AK ANCHORAGE/LAKE H PALH LHD 61 11N 149 58W 22 X A 7 US
AK ANCHORAGE/ARTCC PZAN ZAN 61 10N 149 59W 22 A 8 US
AK ANCHORAGE/MERRIL PAMR MRI 61 13N 149 51W 41 X A 7 US
AK ANGOON SEAPLANE PAGN 57 30N 134 35W 2 X 8 US
AK ANIAK PANI ANI 70232 61 35N 159 32W 26 X 7 US
AK ANNETTE ISLAND PANT ANN 70398 55 02N 131 34W 36 X X A 5 US
AK ANVIK PANV ANV 62 39N 160 11W 99 X 7 US
AK ARCTIC VILLAGE PARC ARC 68 07N 145 35W 636 X 7 US
AK ATQASUK BURNELL PATQ ATK 70 28N 157 26W 29 X 7 US
AK ATKA PAAK AKA 52 13N 174 12W 17 X 7 US
AK BARROW PABR BRW 70026 71 17N 156 48W 7 X T X A 5 US
AK BARROW ARM-NSA 70027 71 19N 156 37W 7 X 8 US
AK BARTER ISLAND PABA BTI 70086 70 08N 143 35W 2 X W 7 US
AK BETHEL PABE BET 70219 60 47N 161 51W 41 X T X A 5 US
AK BETHEL/88D PABC ABC 60 48N 161 53W 49 X 8 US
AK BETTLES PABT BTT 70174 66 55N 151 31W 195 X T A 6 US
AK BIG RIVER LAKES PALV LVR 60 49N 152 18W 12 X 7 US
AK BIRCHWOOD PABV BCV 61 25N 149 31W 29 X 7 US
AK BREVIG_MISSION PFKT 65 20N 166 28W 9 X 7 US
AK BUCKLAND PABL BVK 65 59N 161 09W 7 X 7 US
AK CANTWELL PATW TTW 63 23N 148 57W 668 X 7 US
AK CAPE LISBURNE PALU LUR 70104 68 53N 166 08W 3 X T W 6 US
AK CAPE NEWENHAM PAEH EHM 70305 58 39N 162 04W 161 X T 6 US
AK CAPE ROMANZOF PACZ CZF 70212 61 47N 166 02W 146 X T 6 US
AK CENTRAL PARL 65 34N 144 47W 284 X 7 US
AK CENTRAL PACE 65 34N 144 47W 286 X 7 US
AK CENTRAL AK PROF CEN 70197 65 30N 144 41W 259 W 8 US
AK CHANDALAR LAKE PALR WCR 67 30N 148 29W 585 X 7 US
AK CHEVAK PAVA 61 32N 165 36W 23 X 7 US
AK CHIGNIK BAY PAJC AJC 56 19N 158 22W 15 X 7 US
AK CIRCLE/PAFC RFC PACR CRC 65 50N 144 04W 182 X R 7 US
AK COLD BAY PACD CDB 70316 55 12N 162 43W 30 X T X A 5 US
AK CORDOVA PACV CDV 70296 60 30N 145 30W 12 X T A 6 US
AK DEADHORSE PASC SCC 70 12N 148 28W 15 X T A 6 US
AK DEERING PADE DEE 66 04N 162 46W 5 X A 7 US
AK DELTA JUNCTION PABI BIG 70267 64 00N 145 44W 386 X T A 6 US
我的代码
station_file = open('../DATA/stations.txt', 'r')
data = station_file.read()
skip_res = ['^$', '^.*d{2}\-[A-Z]{3}\-\d{2}','^!'] #List of regular expressions which only appear in lines of metadata (not actual data)
data = data.split('\n')
for loop in data:
breakcheck = False # In the event a regular expression matches, this will turn to true and skip that line
for check in skip_res:
current = re.compile(check)
if current.search(loop) == None:
continue
else:
breakcheck = True
break
if breakcheck:
continue
else:
print(loop) # Should only print out lines containing actual data.
您的日期正则表达式在第一个“d”之前缺少反斜杠。你知道吗
应该是
匹配日期的模式在第一个
d
之前缺少一个\
。更改为:因为您使用的是
re.search()
,所以不需要从字符串的开头进行匹配。而且,您不需要逃避-
。你知道吗注意使用原始字符串(由
r
前缀表示)来指定模式。一般来说,您应该为regex模式使用原始字符串,因为有些字符串转义序列也是regex模式,例如\b
。作为普通字符串,它表示退格字符。在原始字符串中,它被视为\
,后跟b
,这是“单词开头或结尾”的regex模式。你知道吗另一件值得一提的事情是,通过将多个模式与
|
连接在一起,可以同时检查多个模式的匹配。把它想象成“或”。这样您的代码就可以写得更简洁:只有少数regex模式时,编译regex模式没有任何好处,因为
re
模块将缓存它们。你知道吗相关问题 更多 >
编程相关推荐