将空值插入python字典

2024-03-29 10:05:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个python字典,我最终想把它插入mysql数据库。我在分析来自“条目”的数据,它的意思是(符号是数字):

entries = [ "['data'] runtime: ###, scan: ###", 
            "['data'] ctime: ###, scan: ###", 
            "['data'] runtime: ###", ... ]

“”中的每一项都是一个单独的条目。现在我使用regex提取与每个条目相关的运行时、ctime和扫描,如下所示:

import re
terms = (["runtime", "runtime\s?:\s?(\d+)"],
         ["ctime", "ctime\s?:\s?(\d+)"],
         ["scan", "scan\s?:\s?(\d+)"])
d = {}
for i in range(len(terms)):
    def getTerm(term, entries):
        pattern = re.compile(term)
        output = pattern.findall(str(entries))
        return output
    d[terms[i][0]] = getTerm(terms[i][1], entries)

但是,正如您所看到的,并不是所有条目都有运行时、ctime和scan。如果某个值没有出现在条目中,我希望它以[]或NULL(或None)的形式输入到字典中,因为将来如果我查看字典中每个键的特定#元素,我希望所有数据都与一个特定条目相关联。我想让我的字典看起来像这样:

d = {'ctime': [None, '###', None], 'runtime': ['###', None, '###'], 'scan': ['###', '###', None]}

我该怎么做?你知道吗


Tags: 数据renoneoutputdatascan字典条目
2条回答

re.findall()在找不到匹配项时返回一个空列表([]),因此不需要空回退。如果您想在找不到术语时使用None,如Brennan所说,则使用findall(string) or None。你知道吗

考虑使用列表理解来循环所有条目,而dict理解则将regex模式应用于同一条目并将结果保存在dict中

import re
terms = (["runtime", re.compile("runtime\s?:\s?(\d+)")],
         ["ctime", re.compile("ctime\s?:\s?(\d+)")],
         ["scan", re.compile("scan\s?:\s?(\d+)")])
results = [{property: pattern.findall(entry) for property, pattern in terms} for entry in entries]

现在你有了这样的东西:

[{"runtime": None, "scan": ###, "ctime": ###}, {"runtime": ###, "scan": ###, "ctime": ###}, {"runtime": ###, "scan": None, "ctime": None}, ...]

上述代码相当于(但性能更高):

results = []
for entry in entries:
    entry_dict = {}
    for term, regex_pattern in terms:
        entry_dict[term] = regex_pattern.findall(entry) or None
    results.append(entry_dict)

如果entries是一个字符串列表,其中可能包含关键字,也可能不包含关键字,并且顺序很重要,那么我们需要对这些条目进行迭代:

第一种选择:

import re

entries = [ "['data'] runtime: ###, scan: ###",
            "['data'] ctime: ###, scan: ###",
            "['data'] runtime: ###" ]

allterms = (["runtime", "runtime\s?:\s?([a-zA-Z0-9_#]*)"],
            ["ctime", "ctime\s?:\s?([a-zA-Z0-9_#]*)"],
            ["scan", "scan\s?:\s?([a-zA-Z0-9_#]*)"])
terms = [allterms[i][0] for i in range(len(allterms))]
patterns = [allterms[i][1] for i in range(len(allterms))]

def get_terms(entry):
    for i in range(len(terms)):
        term = re.search(patterns[i], entry)
        term = term.groups()[0] if term else None
        d[terms[i]] += [term]
        pass

d = {t: [] for t in allterms}
for entry in entries:
     get_terms(entry)

第二个异步选项:

# pip install futures  # if using Python 2 
from concurrent.futures import ThreadPoolExecutor

d = {t: [] for t in allterms}
with ThreadPoolExecutor() as executor:
    for entry in entries:
        get_terms(entry)

编辑:与@Wynne:合作开发的聊天解决方案)

相关问题 更多 >