如何在子字符串匹配时映射两个列表中的值

2024-04-28 20:06:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我在两个不同的列表中列出了值:

list1 = [
    "1003_0123_20200821091044_ion_fri_jl.dat",
    "8005_0086_20200821090605_ion_fri_jl.dat",
    "1003_0123_20200821091999_ion_fri_jl.dat",
]

list2 = [
    "IMM CCA CADD USD GAAP_202103311352_20200821091999_FRI",
    "ICM CCA CADD USD GAAP_202103311352_20200821090605_FRI",
    "CCA CTAD USD GAAPA_202103311352_20200821091044_FRI",
]

我想对具有由str.split('_')[2]获得的相同子字符串的值进行配对。例如,list1中的第一个元素具有子字符串20200821091044,它与list2中的第三个元素匹配。然后我想得到如下匹配的值:

[
    (
        "1003_0123_20200821091044_ion_fri_jl.dat",
        "CCA CTAD USD GAAPA_202103311352_20200821091044_FRI",
    ),
    (
        "8005_0086_20200821090605_ion_fri_jl.dat",
        "ICM CCA CADD USD GAAP_202103311352_20200821090605_FRI",
    ),
    (
        "1003_0123_20200821091999_ion_fri_jl.dat",
        "IMM CCA CADD USD GAAP_202103311352_20200821091999_FRI",
    ),
]

或者以字典的形式


Tags: icmdatusdjllist2ioncaddlist1
2条回答

循环第一个列表,提取子字符串,循环第二个列表并找到匹配项

results = []

for x in list1:
    substring = x.split("_")[2]

    for y in list2:
        if substring in y:
            results.append((x, y))

您的问题之前的编辑是“或以字典格式”,我将在这里使用:

import collections

grouped = collections.defaultdict(list)
for item in list1+list2:  # or itertools.chain(list1, list2)
    grouped[item.split('_')[2]].append(item)

grouped是:

defaultdict(list,
            {'20200821091044': ['1003_0123_20200821091044_ion_fri_jl.dat',
              'CCA CTAD USD GAAPA_202103311352_20200821091044_FRI'],
             '20200821090605': ['8005_0086_20200821090605_ion_fri_jl.dat',
              'ICM CCA CADD USD GAAP_202103311352_20200821090605_FRI'],
             '20200821091999': ['1003_0123_20200821091999_ion_fri_jl.dat',
              'IMM CCA CADD USD GAAP_202103311352_20200821091999_FRI']})

list(grouped.values())将其放入成对列表中:

[['1003_0123_20200821091044_ion_fri_jl.dat',
  'CCA CTAD USD GAAPA_202103311352_20200821091044_FRI'],
 ['8005_0086_20200821090605_ion_fri_jl.dat',
  'ICM CCA CADD USD GAAP_202103311352_20200821090605_FRI'],
 ['1003_0123_20200821091999_ion_fri_jl.dat',
  'IMM CCA CADD USD GAAP_202103311352_20200821091999_FRI']]

相关问题 更多 >