如何删除列表中的起始词?

2024-06-09 06:32:03 发布

您现在位置:Python中文网/ 问答频道 /正文

给定一个包含一些“inception”单词的列表,如何删除inception单词?如何找到更大的词?你知道吗

让我们将起始词定义为出现在同一列表中的较大单词的一部分。你知道吗

任务

To make it very clear, if a list contains ['a', 'b', 'a b c'], removes 'a' and 'b' because there is an element that contains 'a' and 'b' that is bigger itself.

例1,[in]:

[u'dose rate', u'object', u'dose', u'rate', u'computation']

[输出]:

[u'dose rate', u'object',u'computation']

例2,[in]:

[u'shift', u'magnetic', u'system', u'magnetic sensor', u'phase shift', u'phase', u'output', u'sensor', u'sensing', u'sensor system']

由于存在“磁性”、“传感器”、“系统”、“磁性传感器”和“传感器系统”,我们可以:

期望输出,[输出]:

[u'system', u'magnetic sensor', u'phase shift', u'output', u'sensing']

或[退出]:

[u'magnetic'u'phase shift', u'output', u'sensing', u'sensor system']

我尝试了以下操作,但没有达到预期效果:

ls = [u'dose rate', u'object', u'dose', u'rate', u'computation']
>>> set([i for i in ls for j in ls if i!=j or i not in j])
set([u'dose rate', u'object', u'rate', u'computation', u'dose'])
>>> set([j for i in ls for j in ls if i!=j or i not in j])
set([u'rate', u'object', u'dose rate', u'computation', u'dose'])
>>> set([j for j in ls for i in ls if i!=j or i not in j])
set([u'dose rate', u'object', u'rate', u'computation', u'dose'])

Tags: inforifshiftobjectratesensor单词
3条回答

为了满足第一个例子,你可以这样做

>>> words = [u'dose rate', u'object', u'dose', u'rate', u'computation']
>>> [w1 for w1 in words if not any(w1 in w2 for w2 in words if w2 != w1)]
[u'dose rate', u'object', u'computation']

但是您的第二个示例表明您的需求要复杂一些。不能多次使用同一个小词构成字符串。你知道吗

不幸的是,一个班轮是不可能的。试试这样

def remove_comprising(words):
    seen = set()
    result_words = []
    for word in words:
        for small_word in words:
            if small_word in word and small_word != word:
                if small_word in seen:
                    word = word.replace(small_word, '')
                else:
                    seen.add(small_word)
        result_words.append(word)
    return [word.strip() for word in result_words if word not in seen]

然后我们得到了两个例子1的正确结果

>>> words = [u'dose rate', u'object', u'dose', u'rate', u'computation']
>>> remove_comprising(words)
[u'dose rate', u'object', u'computation']

例2

>>> words = [u'shift', u'magnetic', u'system', u'magnetic sensor', u'phase shift', u'phase', u'output', u'sensor', u'sensing', u'sensor system']
>>> remove_comprising(words)
[u'magnetic sensor', u'phase shift', u'output', u'sensing', u'system']

给出一个单词列表:

>>> words = [u'dose rate', u'object', u'dose', u'rate', u'computation']

以及起始词的定义:

>>> inception = lambda x: any(x in w for w in words if len(x) < len(w))

我们可以这样构造一个“非起始词”列表:

>>> [w for w in words if not inception(w)]
[u'dose rate', u'object', u'computation']

一个有点复杂的函数来读:不是pythonic在它的实现中,而是应该解决的问题。你知道吗

其基本思想是:评估并标记列表中的每个单词是否应该包括在内。 然后用那个旗子,把单词打印出来。你知道吗

麻烦的是,你想找到可以是其他两个较大单词的一部分的单词,这使得标记更加精细(不是简单地保留或拒绝,而是保留、继续保留和拒绝)

import copy
def inception(wordlist):

    # dont want to mutilate original list
    new_wordlist = copy.deepcopy(wordlist)

    # find length of wordlist to know when original length is traversed
    word_count = len(new_wordlist)
    output_set = set()
    output_list = [] # flags existence, -1 = evaluation postponed, 0 = exclude, 1= include
    eval_list = []

    # iterate through list
    for idx, word in enumerate(new_wordlist):
        inner_words = word.split()

        # if its only 1 word, evaluate at the end 
        # Can be made smarter to reject earlier
        if len(inner_words) == 1 and idx < word_count:
            output_list.append(-1)
            eval_list.append(word)
            new_wordlist.append(word)
            continue        

        # Flag existence of inner words if they haven't been found
        existence = 0
        for in_wrd in inner_words:
            if in_wrd in output_set:
                output_list.append(0)       
            else:
                # keep continued 
                existence += 1
                output_set.add(in_wrd)
                output_list.append(existence)
            eval_list.append(in_wrd)

    # now evaluate by position of flags
    final_set = set()
    for idx, word in enumerate(eval_list):
        if output_list[idx] > 0:

            # combine if words are in order
            if output_list[idx] > 1:
                final_set.remove(eval_list[idx-1])
                word = ' '.join([eval_list[idx-1], eval_list[idx]])
            final_set.add(word) 
    return list(final_set)

我只测试了你提供的2套。如果你有失败的集合,请将它们添加到评论中,我想更正。你知道吗

相关问题 更多 >