如何从多个文件夹和文件中读取特定段落

2条回答

网友

1楼 · 编辑于 2024-04-26 05:27:02

解决了我的问题与字符串切片。你知道吗

基本上，我只是扫描每一行的开始字符串和结束字符串，并作出行了出来。然后将这些行附加到列表并写入文件。你知道吗

for f in file_list:
        with open(f, 'rt') as fl:
            lines = fl.read()
            lines = lines[lines.find('## Required reading'):lines.find('## Supplementary reading')]
            lines = lines[lines.find('## Required reading'):lines.find('### Supplementary reading')]
            lines = lines[lines.find('## Required reading'):lines.find('## Required reading paragraph')]
            rr.append(lines)

但是在我的列表和文件中仍然有“##必读”，所以我运行了第二个读/写方法。你知道吗

def removeHashTag():
    global line
    f = open("required_reading.md", "r")
    lines = f.readlines()
    f.close()
    f = open("required_reading.md", "w")
    for line in lines:
        if line != "## Required reading" + "\n":
            f.write(line)
    f.close()
removeHashTag()

网友

2楼 · 编辑于 2024-04-26 05:27:02

您需要了解导入的文本的结构。段落是如何分开的？它看起来像“\n\n”吗？您能在“\n\n”上拆分文本文件并返回所需段落的索引吗？你知道吗

text = 'paragraph one text\n\nparagraph two text\n\nparagraph three text'.split('\n\n')[1]
print(text)
>>> 'paragraph two text'

另一个选项，如其他人所提到的，是正则表达式，也就是RegEx，您可以使用

import re

正则表达式用于在文本中查找模式。你知道吗

转到https://pythex.org/并抓取其中一个文档的样本，尝试找到与您要找到的段落匹配的模式。你知道吗

在这里了解有关RegEx的更多信息 https://regexone.com/references/python

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何从多个文件夹和文件中读取特定段落

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >