如何删除列表中的起始词？

ls = [u'dose rate', u'object', u'dose', u'rate', u'computation'] >>> set([i for i in ls for j in ls if i!=j or i not in j]) set([u'dose rate', u'object', u'rate', u'computation', u'dose']) >>> set([j for i in ls for j in ls if i!=j or i not in j]) set([u'rate', u'object', u'dose rate', u'computation', u'dose']) >>> set([j for j in ls for i in ls if i!=j or i not in j]) set([u'dose rate', u'object', u'rate', u'computation', u'dose'])

3条回答

网友

1楼 · 编辑于 2024-06-09 06:32:03

为了满足第一个例子，你可以这样做

>>> words = [u'dose rate', u'object', u'dose', u'rate', u'computation']
>>> [w1 for w1 in words if not any(w1 in w2 for w2 in words if w2 != w1)]
[u'dose rate', u'object', u'computation']

但是您的第二个示例表明您的需求要复杂一些。不能多次使用同一个小词构成字符串。你知道吗

不幸的是，一个班轮是不可能的。试试这样

def remove_comprising(words):
    seen = set()
    result_words = []
    for word in words:
        for small_word in words:
            if small_word in word and small_word != word:
                if small_word in seen:
                    word = word.replace(small_word, '')
                else:
                    seen.add(small_word)
        result_words.append(word)
    return [word.strip() for word in result_words if word not in seen]

然后我们得到了两个例子1的正确结果

>>> words = [u'dose rate', u'object', u'dose', u'rate', u'computation']
>>> remove_comprising(words)
[u'dose rate', u'object', u'computation']

例2

>>> words = [u'shift', u'magnetic', u'system', u'magnetic sensor', u'phase shift', u'phase', u'output', u'sensor', u'sensing', u'sensor system']
>>> remove_comprising(words)
[u'magnetic sensor', u'phase shift', u'output', u'sensing', u'system']

网友

2楼 · 编辑于 2024-06-09 06:32:03

给出一个单词列表：

>>> words = [u'dose rate', u'object', u'dose', u'rate', u'computation']

以及起始词的定义：

>>> inception = lambda x: any(x in w for w in words if len(x) < len(w))

我们可以这样构造一个“非起始词”列表：

>>> [w for w in words if not inception(w)]
[u'dose rate', u'object', u'computation']

网友
3楼 · 编辑于 2024-06-09 06:32:03

一个有点复杂的函数来读：不是pythonic在它的实现中，而是应该解决的问题。你知道吗

其基本思想是：评估并标记列表中的每个单词是否应该包括在内。然后用那个旗子，把单词打印出来。你知道吗

麻烦的是，你想找到可以是其他两个较大单词的一部分的单词，这使得标记更加精细（不是简单地保留或拒绝，而是保留、继续保留和拒绝）

import copy
def inception(wordlist):

    # dont want to mutilate original list
    new_wordlist = copy.deepcopy(wordlist)

    # find length of wordlist to know when original length is traversed
    word_count = len(new_wordlist)
    output_set = set()
    output_list = [] # flags existence, -1 = evaluation postponed, 0 = exclude, 1= include
    eval_list = []

    # iterate through list
    for idx, word in enumerate(new_wordlist):
        inner_words = word.split()

        # if its only 1 word, evaluate at the end 
        # Can be made smarter to reject earlier
        if len(inner_words) == 1 and idx < word_count:
            output_list.append(-1)
            eval_list.append(word)
            new_wordlist.append(word)
            continue        

        # Flag existence of inner words if they haven't been found
        existence = 0
        for in_wrd in inner_words:
            if in_wrd in output_set:
                output_list.append(0)       
            else:
                # keep continued 
                existence += 1
                output_set.add(in_wrd)
                output_list.append(existence)
            eval_list.append(in_wrd)

    # now evaluate by position of flags
    final_set = set()
    for idx, word in enumerate(eval_list):
        if output_list[idx] > 0:

            # combine if words are in order
            if output_list[idx] > 1:
                final_set.remove(eval_list[idx-1])
                word = ' '.join([eval_list[idx-1], eval_list[idx]])
            final_set.add(word) 
    return list(final_set)

我只测试了你提供的2套。如果你有失败的集合，请将它们添加到评论中，我想更正。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章