关于文件字体问题的Regex re.sub

def get_front_matter(file, start='---', end='---'): """Strip file and retrieve front matter then format the value""" content = {} with open(file, 'r', encoding='UTF-8') as file_content: for content_line in file_content: if content_line.strip() == start: break for content_line in file_content: if content_line.strip() == end: break line_data = content_line.split(':', 1) # If we cannot split decently, carry on if len(line_data) != 2: continue # format the string to store in dict for better usage content[line_data[0]] = re.sub(r"[\n\t]*", "", line_data[1]).strip(' "') return content

1条回答

网友

1楼 · 发布于 2024-05-15 03:06:29

我几乎保留了您的代码，关键是在我们开始之前不要为结果添加值确保我们收集了完整的value（当它被拆分为多行时），这是通过验证下一行str来完成的，如果它是有效值(key: some value)，那么将前一行key及其content添加到结果中，或者如果它是结束字符 -，我希望注释能让事情更清楚

    def get_front_matter(file, start=' -', end=' -'):
        """Strip file and retrieve front matter then format the value"""
        result = {}
        with open(file, 'r', encoding='UTF-8') as file_content:
            for content_line in file_content:
                if content_line.strip() == start:
                    break

            content = ''
            key = ''
            for content_line in file_content:
                if content_line.strip() == end:
                    if key and content:
                        # add last key: content before breaking out
                        result[key] = re.sub(r"[\n\t]*", "", content).strip (' "')
                    break

                line_data = content_line.split(':', 1)
                if len(line_data) == 2 and not content:
                    # this is our first key: content, in this point we don't have previous content so we should keep them and check the next value first
                    key = line_data[0]
                    content = line_data[1]
                    continue
                elif len(line_data) == 2:  # we found another valid value 
                    # add previous (key, content) and keep the new (key , content)
                    result[key] = re.sub(r"[\n\t]*", "", content).strip(' "')
                    key = line_data[0]
                    content = line_data[1]
                else:
                    # not a valid key: value add it to previous value because it's a value splited in multiple line
                    content += content_line

        return result

注意：我用结果更改了内容名称，此代码将因如下情况而中断：

     title: "Meeting"
    date: 2019-03-14T07:51:28+01:00
    draft: false
    status:
      [
        "somevalue:process",  # if the value contains ':'
        "todo",
        "hold"
      ]

在这里，您没有指定如何区分键和包含“：”的值（如果它前面没有键）。我希望这不会让你失望你有问题吗

相关问题更多 >

编程相关推荐

热门问题

热门文章