使用正则表达式分离文本块

nicaragua president ends visit to finland . nn(ends-3, nicaragua-1) nn(ends-3, president-2) nsubj(visit-4, ends-3) xsubj(finland-6, ends-3) root(ROOT-0, visit-4) aux(finland-6, to-5) xcomp(visit-4, finland-6) guatemala president ends visit to tropos . nn(ends-3, guatemala-1) nn(ends-3, president-2) nsubj(visit-4, ends-3) xsubj(finland-6, ends-3) root(ROOT-0, visit-4) aux(tropos-6, to-5) xcomp(visit-4, tropos-6) [...]

1条回答

网友

1楼 · 发布于 2024-06-01 09:21:41

您可以这样做，尽管对于您正在解析的结构来说，这样做可能有些过头了。如果您还需要解析依赖项，那么扩展它应该相对容易。我还没有运行这个，甚至没有检查语法，所以不要杀了我，如果它不能马上工作。你知道吗

READ_SENT = 0
PRE_DEPS = 1
DEPS = 2
POST_DEPS = 3
def parse_output(input):
    state = READ_SENT
    results = []
    sent = None
    deps = []
    for line in input.splitlines():
        if state == READ_SENT:
            sent = line
            state = PRE_DEPS
        elif state == PRE_DEPS:
             if line:
                 raise Exception('invalid format')
             else:
                 state = DEPS
         elif state == DEPS:
             if line:
                 deps.append(line)
             else:
                 state = POST_DEPS
         elif state == POST_DEPS:
             if line:
                 raise Exception('invalid format')
             else:
                 results.append((sent, deps))
                 sent = None
                 deps = []
                 state = READ_SENT
    return results

相关问题更多 >

编程相关推荐

热门问题

热门文章