我正在尝试处理CSV格式的字符串中未匹配的双引号。
准确地说
"It "does "not "make "sense", Well, "Does "it"
应更正为
"It" "does" "not" "make" "sense", Well, "Does" "it"
所以基本上我要做的就是
replace all the ' " '
- Not preceded by a beginning of line or a comma (and)
- Not followed by a comma or an end of line
with ' " " '
为此,我使用下面的regex
(?<!^|,)"(?!,|$)
问题是当Ruby正则表达式引擎(http://www.rubular.com/)能够解析正则表达式时,python正则表达式引擎(https://pythex.org/,http://www.pyregex.com/)抛出以下错误
Invalid regular expression: look-behind requires fixed-width pattern
对于Python2.7.3,它抛出
sre_constants.error: look-behind requires fixed-width pattern
有人能告诉我Python在这里的烦恼吗?
一、二、二、三、三、四、四、四、四、四、四、四、六、六、六、六
根据Tim的响应,我得到了多行字符串的以下输出
>>> str = """ "It "does "not "make "sense", Well, "Does "it"
... "It "does "not "make "sense", Well, "Does "it"
... "It "does "not "make "sense", Well, "Does "it"
... "It "does "not "make "sense", Well, "Does "it" """
>>> re.sub(r'\b\s*"(?!,|$)', '" "', str)
' "It" "does" "not" "make" "sense", Well, "Does" "it" "\n"It" "does" "not" "make" "sense", Well, "Does" "it" "\n"It" "does" "not" "make" "sense", Well, "Does" "it" "\n"It" "does" "not" "make" "sense", Well, "Does" "it" " '
在每行的末尾,在“it”旁边添加了两个双引号。
所以我对regex做了一个很小的改动来处理一个新的行。
re.sub(r'\b\s*"(?!,|$)', '" "', str,flags=re.MULTILINE)
但这给了输出
>>> re.sub(r'\b\s*"(?!,|$)', '" "', str,flags=re.MULTILINE)
' "It" "does" "not" "make" "sense", Well, "Does" "it"\n... "It" "does" "not" "make" "sense", Well, "Does" "it"\n... "It" "does" "not" "make" "sense", Well, "Does" "it"\n... "It" "does" "not" "make" "sense", Well, "Does" "it" " '
仅最后一个“it”就有两个双引号。
但是我想知道为什么‘$’行尾字符不能识别行尾。
一、二、二、三、三、四、四、四、四、四、四、四、六、六、六、六
最后的答案是
re.sub(r'\b\s*"(?!,|[ \t]*$)', '" "', str,flags=re.MULTILINE)
Python lookbehind断言的宽度必须是固定的,但您可以尝试以下操作:
说明:
相关问题 更多 >
编程相关推荐