匹配多行,直到字符串包含

2024-04-25 09:02:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个csv文件,其中包含以下信息,我需要正则表达式匹配的字符串作为'B08-1506'的起点,直到下一个模式匹配上述字符串。我想把这三行附加起来,作为一行

B08-1506,324873, st, $0.0,
ljkflka,,,,,
1 of 37 jksdfhjfhjk
jkdsfh,,,,,,,
B08-1606,324873, st, $0.0,
ljkflka,,,,,
1 of 37 jksdfhjfhjk
jkdsfh,,,,,,,
B09-0680,324873, st, $0.0,
ljkflka,,,,,
1 of 37 jksdfhjfhjk
jkdsfh,,,,,,,
B09-0681,324873, st, $0.0,
ljkflka,,,,,
1 of 37 jksdfhjfhjk
jkdsfh,,,,,,,

输出应该是这样的

B08-1506,324873, st, $0.0,ljkflka,jksdfhjfhjk,jkdsfh
B08-1606,324873, st, $0.0,ljkflka,jksdfhjfhjk,jkdsfh
B09-0680,324873, st, $0.0,ljkflka,jksdfhjfhjk,jkdsfh
B09-0681,324873, st, $0.0,ljkflka,jksdfhjfhjk,jkdsfh

Tags: 文件ofcsv字符串信息st起点b08
1条回答
网友
1楼 · 发布于 2024-04-25 09:02:43

Nisarg说最好修复源csv格式。但如果你不能理解,下面的代码片段可能会有所帮助。你知道吗

演示:不带Regex

s = """B08-1506,324873, st, $0.0,
ljkflka,,,,,
1 of 37 jksdfhjfhjk
jkdsfh,,,,,,,
B08-1606,324873, st, $0.0,
ljkflka,,,,,
1 of 37 jksdfhjfhjk
jkdsfh,,,,,,,
B09-0680,324873, st, $0.0,
ljkflka,,,,,
1 of 37 jksdfhjfhjk
jkdsfh,,,,,,,
B09-0681,324873, st, $0.0,
ljkflka,,,,,
1 of 37 jksdfhjfhjk
jkdsfh,,,,,,,"""

res = []
for i in s.split("\n"):
    if i.startswith("B0"):    #Check if line starts with "B0"
        res.append(i)
    else:                      #else concat to the previous element in res. 
        res[-1] = res[-1]+i

res = [filter(None, i.split(",")) for i in res]    #Filter to remove all empty elements
for i in res:
    print(", ".join(i))

输出:

B08-1506, 324873,  st,  $0.0, ljkflka, 1 of 37 jksdfhjfhjkjkdsfh
B08-1606, 324873,  st,  $0.0, ljkflka, 1 of 37 jksdfhjfhjkjkdsfh
B09-0680, 324873,  st,  $0.0, ljkflka, 1 of 37 jksdfhjfhjkjkdsfh
B09-0681, 324873,  st,  $0.0, ljkflka, 1 of 37 jksdfhjfhjkjkdsfh

相关问题 更多 >