在Python中使用正则表达式进行拆分

2024-05-16 11:30:03 发布

男 | 程序猿一只，喜欢编程写python代码。

我有一些章节的第一本哈利波特书在txt文件。我想将txt文件拆分为一个包含不同章节的列表，不包含章节号和章节名称。我怎样才能用正则表达式做到这一点

txt如下所示：

Chapter one

The boy who lived

Mr. and Mrs. Dursley, ...

Chapter two

The vanishing glass

Nearly ten years had passed...

因此，我希望我的列表如下所示：

['Mr. and Mrs. Dursley, ...', 'Nearly ten years had passed...']

我不熟悉regex，但以下是我迄今为止尝试过的：

chapter_list = re.split('.*\n\nchapter.*\n\n?, text)

而且所有章节名称都不是以开头的

Tags： and 文件 the txt 名称列表 chapter mr

1条回答

网友

1楼 · 发布于 2024-05-16 11:30:03

这应该做到：

re.split('Chapter \w+\n'  ,string)

您可能会得到一个空元素，但如果需要，可以很容易地删除它

['',
 '\nThe boy who lived\n\nMr. and Mrs. Dursley, ...\n\n',
 '\nThe vanishing glass\n\nNearly ten years had passed...']