比循环整个文件更好的regex实现？

# BJD K2SC-Flux EAPFlux Err Flag Spline 2457217.463564 5848.004 5846.670 6.764 0 0.998291 2457217.483996 6195.018 6193.685 6.781 1 0.998291 2457217.504428 6396.612 6395.278 6.790 0 0.998292 2457217.524861 6220.890 6219.556 6.782 0 0.998292 2457217.545293 5891.856 5890.523 6.766 1 0.998292 2457217.565725 5581.000 5579.667 6.749 1 0.998292 2457217.586158 5230.566 5229.232 6.733 1 0.998292 2457217.606590 4901.128 4899.795 6.718 0 0.998293 2457217.627023 4604.127 4602.793 6.700 0 0.998293

foundlines=[] c=0 import re with open('examplefile') as f: for index, line in enumerate(f): try: found = re.findall(r' 1 ', line)[0] foundlines.append(index) print(line) c+=1 except: pass print(c)

3条回答

网友
1楼 · 编辑于 2024-06-16 14:28:50

如果有CSV数据，可以使用csv模块：
import csv with open('your file', 'r', newline='', encoding='utf8') as fp: rows = csv.reader(fp, delimiter=' ') # generator comprehension errors = (row for row in rows if row[4] == '1') for error in errors: print(error)

网友
2楼 · 编辑于 2024-06-16 14:28:50

shell实现可以变得更短，grep有-c选项来获取计数，不需要匿名管道和wc：
grep -c " 1 " examplefile
shell代码只需获取找到模式1的行数，但是Python代码还保留了匹配模式的行索引列表。你知道吗
只需获取行数，就可以使用sum和genexp/list理解，也不需要Regex；简单的字符串__contains__检查就可以了，因为字符串是iterable：
with open('examplefile') as f: count = sum(1 for line in f if ' 1 ' in line) print(count)
如果您也想保留索引，您可以坚持您的想法，只将retest替换为strtest：
count = 0 indexes = [] with open('examplefile') as f: for idx, line in enumerate(f): if ' 1 ' in line: count += 1 indexes.append(idx)
此外，做一个简单的except几乎总是一个坏主意（至少你应该用except Exception来省去SystemExit，KeyboardInterrupt一样的异常），只捕获你知道可能引发的异常。你知道吗
另外，在解析结构化数据时，您应该使用特定的工具，例如这里csv.reader，空格作为分隔符（line.split(' ')在这种情况下也应该这样做），检查索引-4将是最安全的（参见Tomalak's answer）。使用' 1 ' in line测试，如果任何其他列包含1，则会产生误导性结果。你知道吗
考虑到上述情况，下面是使用awk匹配第5个字段的shell方式：
awk '$5 == "1" {count+=1}; END{print count}' examplefile

网友
3楼 · 编辑于 2024-06-16 14:28:50

最短代码

在某些特定的前提下，这是一个非常简短的版本：

你只需要计算像grep调用这样的事件
保证每行只有一个" 1 "
" 1 "只能出现在所需列中
你的文件很容易放进内存

请注意，如果不满足这些前提条件，这可能会导致内存问题或返回误报。你知道吗

print(open("examplefile").read().count(" 1 "))

简单多用，略长

当然，如果你以后真的对这些台词感兴趣，我建议你：

df = pandas.read_table('test.txt', delimiter=" ",
                       comment="#",
                       names=['BJD', 'K2SC-Flux', 'EAPFlux', 'Err', 'Flag', 'Spline'])

要获取标志为1的所有行：

flaggedrows = df[df.Flag == 1]

退货：

            BJD  K2SC-Flux   EAPFlux    Err  Flag    Spline
1  2.457217e+06   6195.018  6193.685  6.781     1  0.998291
4  2.457218e+06   5891.856  5890.523  6.766     1  0.998292
5  2.457218e+06   5581.000  5579.667  6.749     1  0.998292
6  2.457218e+06   5230.566  5229.232  6.733     1  0.998292

数一数：

print(len(flaggedrows))

返回4

最短代码

简单多用，略长

相关问题更多 >

编程相关推荐

热门问题

热门文章