如何使用python以类似字典的格式合并多个重复的键名

2024-06-16 10:09:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个类似字典的格式的数据,其中我有多个重复的键,重复多次,列表中的字符串作为值,我想合并所有具有相同名称的键及其值,数据的格式恰好与dictionary相似,但不是实际的dictionary。我之所以称之为dictionary,是因为它的存在方式

#我得到的数据如下所示

"city":["New York", "Paris", "London"],
"country":["India", "France", "Italy"],
"city":["New Delhi", "Tokio", "Wuhan"],
"organisation":["ITC", "Google", "Facebook"],
"country":["Japan", "South Korea", "Germany"],
"organisation":["TATA", "Amazon", "Ford"]

我有1000个重复的键,其中有一些重复的和唯一的值,我希望根据这些键合并或附加这些值

#预期产量

"city":["New York", "Paris", "London", "New Delhi", "Tokio", "Wuhan"],
"country":["India", "France", "Italy", "Japan", "South Korea", "Germany"],
"organisation":["ITC", "Google", "Facebook", "TATA", "Amazon", "Ford"],

有人能建议吗


Tags: 数据citynewdictionary格式countrylondonparis
1条回答
网友
1楼 · 发布于 2024-06-16 10:09:55
  • 已经确定这不是dict,而是类似于JSON语法的LR(1)语法
  • 采用这种方法,使用LR解析器解析并标记它
  • https://lark-parser.readthedocs.io/en/latest/json_tutorial.html显示如何解析JSON
  • 需要一个小的调整,以便重复的键可以工作(将dict视为列表,请参见代码)
  • 已使用pandas从解析器获取输出,并根据需要进行重塑
from lark import Transformer
from lark import Lark
import pandas as pd
json_parser = Lark(r"""
    ?value: dict
          | list
          | string
          | SIGNED_NUMBER      -> number
          | "true"             -> true
          | "false"            -> false
          | "null"             -> null

    list : "[" [value ("," value)*] "]"

    dict : "{" [pair ("," pair)*] "}"
    pair : string ":" value

    string : ESCAPED_STRING

    %import common.ESCAPED_STRING
    %import common.SIGNED_NUMBER
    %import common.WS
    %ignore WS

    """, start='value')
class TreeToJson(Transformer):
    def string(self, s):
        (s,) = s
        return s[1:-1]
    def number(self, n):
        (n,) = n
        return float(n)

    list = list
    pair = tuple
    dict = list # deal with issue of repeating keys...

    null = lambda self, _: None
    true = lambda self, _: True
    false = lambda self, _: False

js = """{
    "city":["New York", "Paris", "London"],
    "country":["India", "France", "Italy"],
    "city":["New Delhi", "Tokio", "Wuhan"],
    "organisation":["ITC", "Google", "Facebook"],
    "country":["Japan", "South Korea", "Germany"],
    "organisation":["TATA", "Amazon", "Ford"]
}"""    
    
tree = json_parser.parse(js)

pd.DataFrame(TreeToJson().transform(tree), columns=["key", "list"]).explode(
    "list"
).groupby("key").agg({"list": lambda s: s.unique().tolist()}).to_dict()["list"]

输出

{'city': ['New York', 'Paris', 'London', 'New Delhi', 'Tokio', 'Wuhan'],
 'country': ['India', 'France', 'Italy', 'Japan', 'South Korea', 'Germany'],
 'organisation': ['ITC', 'Google', 'Facebook', 'TATA', 'Amazon', 'Ford']}

相关问题 更多 >