如果条件匹配,则保留最长的子字符串

2024-05-16 12:07:42 发布

您现在位置:Python中文网/ 问答频道 /正文

假设,我有如下字符串列表:

lst = ['practical matter', 'a practical matter', 'As a practical matter',
       'the West', 'the West Coast', 'to Hawaii', 'the West Coast to Hawaii']

我想保留最长的字符串,如果它们之间只有一个开头的“a”,“the”。例如,我想保留'a practical matter',因为在开始处有一个'a'。类似地,我想保持'the West Coast',因为在开始时有一个'the'

out = ['a practical matter', 'As a practical matter',
       'the West Coast', 'to Hawaii', 'the West Coast to Hawaii']

我试着做:

delete_from_best_constituents = []
for u in best_parse_constituents:
    for v in best_parse_constituents:
        if u.lower().startswith('the') or v.lower().startswith('the'):
            u_part = u.lower().split('the')[-1].strip()
            v_part =  v.lower().split('the')[-1].strip()
            cond1 = all([w.lower() not in STOP for w in u_part.split()])
            cond2 = all([w.lower() not in STOP for w in v_part.split()])
            if u_part == v.lower() or v_part == u.lower() and cond1 and cond2:
                if not u.lower().startswith('the'):
                    delete_from_best_constituents.append(u)

但我正在寻找一种简洁的、类似Python的方法


Tags: thetoinforiflowersplitbest
1条回答
网友
1楼 · 发布于 2024-05-16 12:07:42

我不知道如何理解您提到的筛选条件,但当您要求为代码提供更具python风格的方法时,您可能会尝试编写一个函数,指定是否接受字符串,并将列表理解与筛选方法结合使用

def isValid(e,arr):
    for a in arr:

        #might break if the same string is contained twice in `arr`
        #you could convert `arr` to set()
        if e!=a: 

            #modify to match your criteria
            condition = e in a 
            
            if condition:
                return False;
    return True
[u for u in lst if isValid(u,lst)]

您可以通过添加第二个列表来进一步缩短此时间

[u for u in lst if not any(
        [ (u != v) and (u in v) for v in lst ]
    ) 
]

上面的代码片段将返回列表['As a practical matter', 'the West Coast to Hawaii']

相关问题 更多 >