python中使用引号的正则表达式

2条回答

网友

1楼 · 编辑于 2024-04-24 12:35:21

你可以这样做

阅读整个文件。
根据不带逗号的换行符拆分输入。
迭代被吐出的元素，然后再次对逗号（和下面可选的换行符）进行拆分，逗号前面和后面都有双引号。

代码：

import re
with open(file) as f:
    fil = f.read()
    m = re.split(r'(?<!,)\n', fil.strip())
    for i in m:
        print(re.split('(?<="),\n?(?=")', i))

输出：

['"column1a"', '"column2a"', '"column3a,"', '"column\\"this is, a test\\"4a"']
['"column1b"', '"column2b,"', '"column3b"', '"column\\"this is, a test\\"4b"']
['"column1c,"', '"column2c"', '"column3c"', '"column\\"this is, a test\\"4c"']

这是支票。。你知道吗

$ cat f
"column1a","column2a","column3a,",
"column\"this is, a test\"4a"
"column1b","column2b,","column3b",
"column\"this is, a test\"4b"
"column1c,","column2c","column3c",
"column\"this is, a test\"4c"
$ python3 f.py
['"column1a"', '"column2a"', '"column3a,"', '"column\\"this is, a test\\"4a"']
['"column1b"', '"column2b,"', '"column3b"', '"column\\"this is, a test\\"4b"']
['"column1c,"', '"column2c"', '"column3c"', '"column\\"this is, a test\\"4c"']

f是输入文件名，f.py是包含python脚本的文件名。你知道吗

网友

2楼 · 编辑于 2024-04-24 12:35:21

你的问题对于我每个月要处理三次的事情来说非常熟悉：）除了我没有使用python来解决它，但是我可以“翻译”我通常做的事情：

text = r'''"column1a","column2a","column
  3a,",
"column\"this is, a test\"4a"
"column1a2","column2a2","column3a2","column4a2"
"column1b","colu
     mn2b,","column3b",             
"column\"this is, a test\"4b"
"column1c,","column2c","column3c",
"column\"this is, a test\"4c"'''

import re

# Number of columns one line is supposed to have
columns = 4
# Temporary variable to hold partial lines
buffer = ""
# Our regex to check for each column
check = re.compile(r'"(?:[^"\\]*|\\.)*"')

# Read the file line by line
for line in text.split("\n"):
    # If there's no stored partial line, this is a new line
    if buffer == "":
        # Check if we get 4 columns and print, if not, put the line
        # into buffer so we store a partial line for later
        if len(check.findall(line)) == columns:
            print matches
        else:
            # use line.strip() if you need to trim whitespaces
            buffer = line
    else:
        # Update the variable (containing a partial line) with the
        # next line and recheck if we get 4 columns
        # use line.strip() if you need to trim whitespaces
        buffer = buffer + line
        # If we indeed get 4, our line is complete and print
        # We must not forget to empty buffer now that we got a whole line
        if len(check.findall(buffer)) == columns:
            print matches
            buffer = ""
        # Optional; always good to have a safety backdoor though
        # If there is a problem with the csv itself like a weird unescaped
        # quote, you send it somewhere else
        elif len(check.findall(buffer)) > columns:
            print "Error: cannot parse line:\n" + buffer
            buffer = ""

ideone demo

相关问题更多 >

编程相关推荐

热门问题

热门文章

python中使用引号的正则表达式

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >