如何将我的字符串拆分为带分隔符异常的嵌套dict?

2024-05-14 16:33:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我需要使用.split来拆分字符串并生成嵌套字典,并且已经使用了','。但是,从下面的数据中可以看出,,在“Review”字段中多次出现,导致Python错误地将值标记为键。Review字段是dict中的一个列表

我的数据示例如下所示:

{"Username": "bkpn1412", "DOB": "31.07.1983", "State": "Oregon", "Reviewed": ["cea76118f6a9110a893de2b7654319c0"]}
{"Username": "gqjs4414", "DOB": "27.07.1998", "State": "Massachusetts", "Reviewed": ["fa04fe6c0dd5189f54fe600838da43d3"]}
{"Username": "eehe1434", "DOB": "08.08.1950", "State": "Idaho", "Reviewed": []}
{"Username": "hkxj1334", "DOB": "03.08.1969", "State": "Florida", "Reviewed": ["f129b1803f447c2b1ce43508fb822810", "3b0c9bc0be65a3461893488314236116"]}
{"Username": "jjbd1412", "DOB": "26.07.2001", "State": "Georgia", "Reviewed": []}

我当前的代码:

#converting list to string using list comprehension
pdict = ' '.join([str(item) for item in products_list]) 
print(type(pdict))

rdict = ' '.join([str(item) for item in reviewers_list]) 
print(type(rdict))

#converting string to list of string
plist  = pdict.split(',')
rlist = rdict.split(',')
print(type(plist))
print(type(rlist))

#list of string to dict
products_dicts = {}
for item in plist:
    t = products_dicts
    for part in item.split(':'):
        t = t.setdefault(part, {})
print(type(products_dicts))

reviewers_dicts = {}
for item in rlist:
    t = reviewers_dicts
    for part in item.split(':'):
        t = t.setdefault(part, {})
print(type(reviewers_dicts))

我尝试过使用不同的分隔符,但都不起作用,我该如何解决这个问题(最好不用通过一个大数据集手动删除所有不需要的逗号)

预期输出应与此类似:

{"Username": "bkpn1412",
"DOB": "31.07.1983",
"State": "Oregon",
"Reviewed": ["cea76118f6a9110a893de2b7654319c0"]}

{"Username": "hkxj1334",
"DOB": "03.08.1969",
"State": "Florida" ,
"Reviewed": ["f129b1803f447c2b1ce43508fb822810", "3b0c9bc0be65a3461893488314236116"]}

Tags: inforstringtypeusernameitemlistproducts
1条回答
网友
1楼 · 发布于 2024-05-14 16:33:22

解决此问题的一种方法是使用内置函数json.loads

假设您有一个包含输入数据的文件:

inputdata.txt

{"Username": "bkpn1412", "DOB": "31.07.1983", "State": "Oregon", "Reviewed": ["cea76118f6a9110a893de2b7654319c0"]}
{"Username": "gqjs4414", "DOB": "27.07.1998", "State": "Massachusetts", "Reviewed": ["fa04fe6c0dd5189f54fe600838da43d3"]}
{"Username": "eehe1434", "DOB": "08.08.1950", "State": "Idaho", "Reviewed": []}
{"Username": "hkxj1334", "DOB": "03.08.1969", "State": "Florida", "Reviewed": ["f129b1803f447c2b1ce43508fb822810", "3b0c9bc0be65a3461893488314236116"]}
{"Username": "jjbd1412", "DOB": "26.07.2001", "State": "Georgia", "Reviewed": []}

实现此数据的解析器将是:

import json
filename = "inputdata.txt"
with open(filename) as f:
    for line in f.readlines():
        parsed_data = json.loads(line)
        print(parsed_data)

一次处理一行(不加载内存中的所有文件)

如果不想加载内存中的所有文件进行处理,可以更改逻辑以使用python中的函数readline from default package

import json
filename = "inputdata.txt"
with open(filename) as f:
    line = f.readline()
    while line:
        parsed_data = json.loads(line)
        print(parsed_data)
        line = f.readline()    

在本例中,我们使用上下文管理器“with”,以便很好地解释为什么使用它,check here。 如果您不想使用withkeywork作为上下文管理器,那么在处理完文件后,必须显式调用close()方法(以避免资源泄漏)

如果您想了解有关文件处理的更多信息,可以签入python official documentation about function open used in files

相关问题 更多 >

    热门问题