如何从嵌套dict文件的最深层提取文本?

2024-04-19 14:50:55 发布

您现在位置:Python中文网/ 问答频道 /正文

这个问题是What is the most efficient way to extract info from complex JSON files?的后续问题

我有一个dict文件的结构可以任意吨。我想捕获所有键为“text”的字符串,以及在没有其他嵌套时所有键为“htext”的字符串。你知道吗

d = {
        "section": {
                   "heading":{"lvl":"A1", "text":"today"},
                   "htext":[
                                {"color":"green",  "text":"yesterday", "htext":["a","b","c"]},
                                {"color":"purple", "text":"tomorrow"}
                               ]
                   }
         }

在上面的例子中,我希望我的结果是["today", "yesterday", "a", "b", "c", "tomorrow"]。你知道吗

上一个问题提供的解决方案是:

def extract_text(obj, acc):
    if isinstance(obj, dict):
        for k, v in obj.items():
            if isinstance(v, (dict, list)):
                extract_text(v, acc)
            elif k == "text":
                acc.append(v)
    elif isinstance(obj, list):
        for item in obj:
            extract_text(item, acc)

我试图通过在elif语句中添加k == 'htext'来修改这个函数,但没有成功。我对Python有新的了解。非常感谢您的帮助!你知道吗


Tags: 字符串textobjfortodayifextractdict
2条回答

试试这个:

d = {
        "section": {
                   "heading":{"lvl":"A1", "text":"today"},
                   "htext":[
                                {"color":"green",  "text":"yesterday", "htext":["a","b","c"]},
                                {"color":"purple", "text":"tomorrow"}
                               ]
                   }
         }

acc = [];

def extract_text(obj, acc):
     if isinstance(obj, dict):
         for k, v in obj.items():
             if isinstance(v, dict):
                 extract_text(v, acc)
             elif k == "text":
                 acc.append(v)
             elif k == "htext" and isinstance(v, list) and all([isinstance(item, str) for item in v]):
                 for item in v:
                     acc.append(item)
             elif isinstance(v, list):
                 extract_text(v, acc)
     elif isinstance(obj, list):
         for item in obj:
             extract_text(item, acc)


extract_text(d, acc)
print(acc)

您可以检查键是否为“htext”,值是否为非嵌套列表:

def extract_text(obj, acc):
    if isinstance(obj, dict):
        for k, v in obj.items():
          if k == "htext" and isinstance(v, list) and not isinstance(v[0], (dict, list)):
             for x in v:
               acc.append(x) 
          elif isinstance(v, (dict, list)):
              extract_text(v, acc)
          elif k == "text":
              acc.append(v)

    elif isinstance(obj, list):
        for item in obj:
            extract_text(item, acc)

#=> ['yesterday', 'a', 'b', 'c', 'tomorrow', 'today']

相关问题 更多 >