使用python和javascript的regex速度较慢，但在go和php中失败的速度很快

ERROR: null value in column "active" violates not-null constraint DETAIL: Failing row contains (2018-08-16 14:23:52.214591+00, 2018-08-16 14:23:52.214591+00, null, 6f6d1bc9-c47e-46f8-b220-dae49bd58090, bf24d26e-4871-4335-9f18-83c5a52f1b3a, Some Product-a1c03dde-2de9-401c-92d5-5c1500908984, {"de_DE": "Fugit tempore voluptas quos est vitae.", "en_GB": "Qu..., {"de_DE": "Fuga reprehenderit nobis reprehenderit natus magni es..., {"de_DE": "Fuga provident dolorum. Corrupti sunt in tempore quae..., my-product-53077578, SKU-53075778, 600, 4300dc25-04e2-4193-94c0-8ee97b636739, 52553d24-6d1c-4ce6-89f9-4ad765599040, null, 38089c3c-423f-430c-b211-ab7a57dbcc13, 7d7dc30e-b06b-48b7-b674-26d4f705583b, null, {}, 0, null, 9980, 100, 1, 5).

^DETAIL:.[^\(]+?\((.[^\)]+?).[^\(]+?.(.[^\)]+?). already exists ^ just changing this to \) make it stop timing out ^DETAIL:.[^\(]+?\((.[^\)]+?)\)[^\(]+?.(.[^\)]+?). already exists

3条回答

网友

1楼 · 编辑于 2024-04-19 04:59:11

Package regexp
import "regexp"
Package regexp implements regular expression search.
The syntax of the regular expressions accepted is the same general syntax used by Perl, Python, and other languages. More precisely, it is the syntax accepted by RE2 and described at https://golang.org/s/re2syntax, except for \C. For an overview of the syntax, run
go doc regexp/syntax
The regexp implementation provided by this package is guaranteed to run in time linear in the size of the input. (This is a property not guaranteed by most open source implementations of regular expressions.) For more information about this property, see
http://swtch.com/~rsc/regexp/regexp1.html
or any book about automata theory.

通过设计，Go正则表达式可以保证在输入大小上以时间线性方式运行，而其他一些正则表达式实现无法保证这种特性。见Regular Expression Matching Can Be Simple And Fast。在

网友

2楼 · 编辑于 2024-04-19 04:59:11

TL；DR:使用：

^DETAIL:\s*+Key[^\(]++\((.+)\)[^\(]+\(([^\)]+)\) already exists

参见matching example和{a2}

解释：

首先，原始regexp似乎与整个键组不匹配，您在lower(internal_name::text处停止，省略了组合键的一些列和不平衡的括号。如果您像这样修改它，它应该可以捕获复合键。如果不应该这样做，请告诉我：

^DETAIL:.[^\(]+.(.+)\)[^\(]+.(.[^\)]+). already exists

仅仅通过改变这个，regex是“可运行的”，但仍然相当慢。在

其中一个主要原因就是这个。它首先匹配DETAIL: Failing row contains(space)，然后与regex的其余部分匹配。它将不匹配，因此它回溯到一个更少的字符，直到DETAIL: Failing row contains并继续使用regexp的其余部分。它将不匹配，因此将返回DETAIL: Failing row contain。。。等等

避免这种情况的一种方法是使用所有格量词。这意味着一旦你拿了东西，你就不能回去了。因此使用这个[^\(]++而不是这个[^\(]+（即：^DETAIL:.[^\(]++.(.+)\)[^\(]+.(.[^\)]+). already exists），使regexp将步骤从28590减少到1290。在

但你还是可以改进的。如果您知道所需数据使用关键字key，请使用它！这样，由于失败的示例中不存在它，它将使正则表达式很快失败（一旦它读取了详细信息和下一个单词）

因此，如果使用^DETAIL:\s*+Key[^\(]++.(.+)\)[^\(]+.(.[^\)]+). already exists步骤现在只有12个。在

如果您觉得使用key太具体了，可以使用一些不太通用的方法来寻找“not'Fail'”。像这样：

^DETAIL:\s*+(?!Fail)[^\(]++.(.+)\)[^\(]+.(.[^\)]+). already exists

这样就有17步了。在

最后，您可以为匹配的内容调优regex。在

更改此项：

^DETAIL:\s*+Key[^\(]++.(.+)\)[^\(]+
.           # <============= here, use \( instead
(.[^\)]+). already exists

据此：

^DETAIL:\s*+Key[^\(]++.(.+)\)[^\(]+\((.[^\)]+). already exists

这将步骤从538减少到215，因为您减少了回溯。在

然后，在删除几个无用的点并用\(或\)（个人喜好）替换一些无用的点之后，您就得到了最终的regex：

^DETAIL:\s*+Key[^\(]++\((.+)\)[^\(]+\(([^\)]+)\) already exists

网友

3楼 · 编辑于 2024-04-19 04:59:11

这是一个regex怪物：）

为什么不拆分这两个正则表达式呢？在

检查already exists是否匹配（非常快）
提取要与现有regex一起显示的数据^DET.[^\(]+.(.[^\)]+).[^\(]+.(.[^\)]+)

这样可以大大提高代码的速度。（你甚至可以像我一样缩短细节）

相关问题更多 >

编程相关推荐

热门问题

热门文章