如何解析JSON中的数据并搜索多个模式或匹配?

2024-06-01 02:23:32 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个类似这样的JSON文件(只是一个示例,实际文件要大得多):

[
 {
   "Rank": 1,
   "Title": "Impact of Vitamin D on Acute Ischemic Stroke Prognosis",
   "Status": "Completed",
   "Study_Results": "No Results Available",
   "Conditions": "Ischemic Stroke",
   "Interventions": "Diagnostic Test: Vitamin D Level",
   "Locations": "Mansoura University, Mansourah, Dakahlia, Egypt|IQVIA, Basel, Switzerland",
   "URL": "https://ClinicalTrials.gov/show/NCT03819452"
 },
 {
   "Rank": 2,
   "Title": "A Randomized Controlled Study of the Effectiveness of Scalp Electroacupuncture in Improving Upper Limb Motor Function in Convalescent Phase of Ischemic Stroke.",
   "Status": "Not yet recruiting",
   "Study_Results": "No Results Available",
   "Conditions": "Ischemic Stroke",
   "Interventions": "Device: scalp electroacupuncture|Device: sham scalp electroacupuncture",
   "Locations": "",
   "URL": "https://ClinicalTrials.gov/show/NCT02850198"
 },
 {
   "Rank": 3,
   "Title": "Mesenchymal Stem Cells for The Treatment of Acute Ischemic Stroke",
   "Status": "Not yet recruiting",
   "Study_Results": "No Results Available",
   "Conditions": "Acute Ischemic Stroke",
   "Interventions": "Biological: UMC119-06",
   "Locations": "Taipei Medical University - Shuang Ho Hospital, Ministry of Health and Welfare., New Taipei City, Taiwan",
   "URL": "https://ClinicalTrials.gov/show/NCT04097652"
 },
 {
   "Rank": 4,
   "Title": "Curative Efficacy of Secondary Prevention for Patients With Ischemic Stroke Through Syndrome Differentiation of TCM",
   "Status": "Completed",
   "Study_Results": "No Results Available",
   "Conditions": "Ischemic Stroke",
   "Interventions": "Drug: Naoxintong Capsule|Drug: Placebo, Alteplase",
   "Locations": "Shanghai seventh People's Hospital, Shanghai, Shanghai, China|Shanghai Ninth People's Hospital affliated to Shanghai Jiao Tong University Shool of Medcine, Shanghai, Shanghai, China|North Branch of Ruijin Hospital affliated to Shanghai Jiao Tong University, Shanghai, Shanghai, China|Shanghai Putuo Central Hospital, Shanghai, Shanghai, China|Shuguang Hospital affliated to Shanghai University of Traditional Chinese Medicine, Shanghai, Shanghai, China|Longhua Hospital affliated to Shanghai University of Traditional Chinese Medicine, Shanghai, Shanghai, China|Zhongshan Hospital affliated to Fudan University, Shanghai, Shanghai, China|Huashan Hospital affliated to Fudan University, Shanghai, Shanghai, China|Shanghai fifth People's Hospital affliated to Fudan University, Shanghai, Shanghai, China|Tongren Hospital affliated to Shanghai Jiao Tong University, Shanghai, Shanghai, China|Tongji Hospital, Shanghai, Shanghai, China|Shanghai Chinese Medicine Hospital, Shanghai, Shanghai, China|Shanghai tenth People's Hospital, Shanghai, Shanghai, China|Shanghai Hospital of Integrative Medicine, Shanghai, Shanghai, China|Xinhua Hospital affliated to Shanghai Jiao Tong University Shool of Medcine, Shanghai, Shanghai, China|Dongfang Hospital affliated to Tongji University, Shanghai, Shanghai, China|Pudong Gong Li Hospital of Shanghai, Shanghai, Shanghai, China|Shanghai sixth People's Hospital affliated to Shanghai Jiao Tong University, Shanghai, Shanghai, China|Changning Tongren Hospital of Shanghai, Shanghai, Shanghai, China|Changhai Hospital, Shanghai, Shanghai, China|Pudong Hospital of Traditional Chinese Medicine, Shanghai, Shanghai, China|East Branch of Shanghai sixth People's Hospital, Shanghai, Shanghai, China|Qingpu Branch of Zhongshan Hospital affliated to Fudan University, Shanghai, Shanghai, China|Shanghai third People's Hospital affliated to Shanghai Jiao Tong University, Shanghai, Shanghai, China",
   "URL": "https://ClinicalTrials.gov/show/NCT02334969"
 },
]

我需要做的是解析以下字符串的“干预”属性:

alt_aliases = ("Alteplase", "alteplase", "tpa", "t-PA", "tPA", "rtpa", "rtPA", "rt-PA", "r-tPA", "Rt-PA", "activase", "Activase")

如果“干预”包含这些字符串中的一个或多个,我想返回与条目关联的“Rank”值

我试过了,但没有成功:

for key in data:
    if (data['Interventions'] == any(alt_aliases)):
        print(data['Rank'])

我意识到“==”将不起作用,因为该属性中可能有许多字符串,但我不确定如何在python中使用正则表达式,尤其是在类似dict的JSON中


Tags: oftostrokepeopleresultsrankchinastudy
3条回答

您可能需要以下内容:

# d stands for dictionary and lofd stands for list of dictionaries.
for d in lofd:
  if [_ for _ in alt_aliases if _ in d['Interventions']]:
    print(d)

您可以这样做,使用内置的^{}函数快速检查与其中一个别名的匹配:

import json

json_filename = 'medical_data.json'
alt_aliases = ("Alteplase", "alteplase", "tpa", "t-PA", "tPA", "rtpa", "rtPA", "rt-PA",
               "r-tPA", "Rt-PA", "activase", "Activase")

with open(json_filename) as file:
    data = json.load(file)

for object in data:
    if any(alias in object['Interventions'] for alias in alt_aliases):
        print("Interventions: {Interventions} - Rank: {Rank}".format(**object))

样本输出:

Interventions: Drug: Naoxintong Capsule|Drug: Placebo, Alteplase - Rank: 4

备选方案:

这也可以通过^{}正则表达式匹配模块来完成,如果您需要,它可以提供更强大的模式匹配操作(如忽略字母大小写),但对于这个相对简单的任务来说,这可能是过分了

import json
import re

json_filename = 'medical_data.json'
alt_aliases = ("Alteplase", "alteplase", "tpa", "t-PA", "tPA", "rtpa", "rtPA", "rt-PA",
               "r-tPA", "Rt-PA", "activase", "Activase")

pattern = '|'.join(map(re.escape, alt_aliases))  # Construct pattern from aliases.
alias_regex = re.compile(pattern)

with open(json_filename) as file:
    data = json.load(file)

for object in data:
    if alias_regex.search(object['Interventions']):
        print("Interventions: {Interventions} - Rank: {Rank}".format(**object))

any()所做的是,如果iterable中至少有一个元素为True,则返回True。这在一开始有点违反直觉,因为要让它按你想要的方式工作,你需要传递一个布尔值列表

因此,不是alt_aliases,而是传入类似[True for alias in alt_aliases if alias in data['interventions']]的内容。此综合列表应包含数据中每个别名的TRU。若那个里并没有别名,列表将为空,any()将产生False

相关问题 更多 >