如何在列表中的范围之间搜索?

2024-04-29 08:56:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我想找出出现在两个范围之间的POS标签,这两个范围是NNP标签的索引值。你知道吗

data = [[('User', 'NNP'),
  ('is', 'VBG'),
  ('not', 'RB'),
  ('able', 'JJ'),
  ('to', 'TO'),
  ('order', 'NN'),
  ('products', 'NNS'),
  ('from', 'IN'),
  ('iShopCatalog', 'NN'),
  ('Coala', 'NNP'),
  ('excluding', 'VBG'),
  ('articles', 'NNS'),
  ('from', 'IN'),
  ('VWR', 'NNP')],
 [('Arfter', 'NNP'),
  ('transferring', 'VBG'),
  ('the', 'DT'),
  ('articles', 'NNS'),
  ('from', 'IN'),
  ('COALA', 'NNP'),
  ('to', 'TO'),
  ('SRM', 'VB'),
  ('the', 'DT'),
  ('Category', 'NNP'),
  ('S9901', 'NNP'),
  ('Dummy', 'NNP'),
  ('is', 'VBZ'),
  ('maintained', 'VBN')],
 [('Due', 'JJ'),
  ('to', 'TO'),
  ('this', 'DT'),
  ('the', 'DT'),
  ('user', 'NN'),
  ('is', 'VBZ'),
  ('not', 'RB'),
  ('able', 'JJ'),
  ('to', 'TO'),
  ('order', 'NN'),
  ('the', 'DT'),
  ('product', 'NN')],
 [('All', 'DT'),
  ('other', 'JJ'),
  ('users', 'NNS'),
  ('can', 'MD'),
  ('order', 'NN'),
  ('these', 'DT'),
  ('articles', 'NNS')],
 [('She', 'PRP'),
  ('can', 'MD'),
  ('order', 'NN'),
  ('other', 'JJ'),
  ('products', 'NNS'),
  ('from', 'IN'),
  ('a', 'DT'),
  ('POETcatalog', 'NNP'),
  ('without', 'IN'),
  ('any', 'DT'),
  ('problems', 'NNS')],
 [('Furtheremore', 'IN'),
  ('she', 'PRP'),
  ('is', 'VBZ'),
  ('able', 'JJ'),
  ('to', 'TO'),
  ('order', 'NN'),
  ('products', 'NNS'),
  ('from', 'IN'),
  ('the', 'DT'),
  ('Vendor', 'NNP'),
  ('VWR', 'NNP'),
  ('through', 'IN'),
  ('COALA', 'NNP')],
 [('But', 'CC'),
  ('articles', 'NNP'),
  ('from', 'VBG'),
  ('all', 'RB'),
  ('other', 'JJ'),
  ('suppliers', 'NNS'),
  ('are', 'NNP'),
  ('not', 'VBG'),
  ('orderable', 'RB')],
 [('I', 'PRP'),
  ('already', 'RB'),
  ('spoke', 'VBD'),
  ('to', 'TO'),
  ('anic', 'VB'),
  ('who', 'WP'),
  ('maintain', 'VBP'),
  ('the', 'DT'),
  ('catalog', 'NN'),
  ('COALA', 'NNP'),
  ('and', 'CC'),
  ('they', 'PRP'),
  ('said', 'VBD'),
  ('that', 'IN'),
  ('the', 'DT'),
  ('reason', 'NN'),
  ('should', 'MD'),
  ('be', 'VB'),
  ('the', 'DT'),
  ('assignment', 'NN'),
  ('of', 'IN'),
  ('the', 'DT'),
  ('plant', 'NN')],
 [('User', 'NNP'),
  ('is', 'VBZ'),
  ('a', 'DT'),
  ('assinged', 'JJ'),
  ('to', 'TO'),
  ('Universitaet', 'NNP'),
  ('Regensburg', 'NNP'),
  ('in', 'IN'),
  ('Scout', 'NNP'),
  ('but', 'CC'),
  ('in', 'IN'),
  ('P17', 'NNP'),
  ('table', 'NN'),
  ('YESRMCDMUSER01', 'NNP'),
  ('she', 'PRP'),
  ('is', 'VBZ'),
  ('assigned', 'VBN'),
  ('to', 'TO'),
  ('company', 'NN'),
  ('001500', 'CD'),
  ('Merck', 'NNP'),
  ('KGaA', 'NNP')],
 [('Please', 'NNP'),
  ('find', 'VB'),
  ('attached', 'JJ'),
  ('some', 'DT'),
  ('screenshots', 'NNS')]]

下面是我的代码。你知道吗

list1 = []
list4 = []
for i in data:
    list2 = []
    list3 = []
    for l,j in enumerate(i):
        if j[1] == 'NNP':
            list2.append(l)
            list3.append(j[0])
    list1.append(list2)
    list4.append(list3)

输出:

list1:

[[0, 9, 13],
 [0, 5, 9, 10, 11],
 [],
 [],
 [7],
 [9, 10, 12],
 [1, 6],
 [9],
 [0, 5, 6, 8, 11, 13, 20, 21],
 [0]]

list4

[['User', 'Coala', 'VWR'],
 ['Arfter', 'COALA', 'Category', 'S9901', 'Dummy'],
 [],
 [],
 ['POETcatalog'],
 ['Vendor', 'VWR', 'COALA'],
 ['articles', 'are'],
 ['COALA'],
 ['User',
  'Universitaet',
  'Regensburg',
  'Scout',
  'P17',
  'YESRMCDMUSER01',
  'Merck',
  'KGaA'],
 ['Please']]

从list1和list4我可以得到NNP的字符串和索引。但是我想用NNP标签的索引值来找出,在每个列表中,NNP标签之间是否存在VB,RB,JJ标签。你知道吗

例如,在列表的第一个列表中,如何编写代码在范围(0-9)和(9-13)之间搜索是否存在带有VB、RB、JJ的标记。你知道吗


Tags: thetoinfromcoalaisdtnn
2条回答

假设我正确地理解了你的问题,以下几点应该是可行的:

search_list = ['VB', 'RB', 'JJ']
for index, set in enumerate(list1):
    temp = set[::-1] # makes a copy of the list in reverse
    while len(temp) > 1:
        first = temp.pop() # removes the last item (first item of set) to control while loop
        second = temp[-1] # references next item (new last item)
        for i in range(first, second + 1): # search all indices between first and second
            if data[index][i][1] in search_list: # index the data by same index as current list1 item
                do_stuff()

基本上:

  1. 在外部for循环中使用枚举来保持与原始数据的并行索引
  2. 在list1中创建一个列表的副本。我做了一个反向复制,因为我个人不喜欢使用pop()和索引,所以如果我想反复弹出列表的第一个项目,我会反转列表。你可以做一个常规的拷贝并使用它列表.pop(0)删除并传递第一项
  3. 从列表中弹出最后一个(第一个)项并引用下一个。你知道吗
  4. 使用这两个项创建一个范围来索引数据并检查提到的项。你知道吗

列表理解,zip offset list1获取范围索引
逻辑在切片data[0][j:k]元素中找到任何匹配项的输出范围

[[j, k] for j, k in zip(list1[0][:], list1[0][1:])
        if any(t[1] in ['VB', 'RB', 'JJ'] for t in data[0][j:k])]

Out[107]: [[0, 9]]

相关问题 更多 >