检查lis中项目的交集

0 | 1 | 2 | 3 | 4 | 5 | ... N ----------------------------------------------------------- cat dog pine tree light fan cat dog pine tree light fan cat dog pine tree light fan cat dog pine tree light fan cat dog pine tree light fan

data0 = unicode("Rainforests are forests characterized by high rainfall, with annual rainfall between 250 and 450 centimetres (98 and 177 in).[1] There are two types of rainforest: tropical rainforest and temperate rainforest. The monsoon trough, alternatively known as the intertropical convergence zone, plays a significant role in creating the climatic conditions necessary for the Earth's tropical rainforests. Around 40% to 75% of all biotic species are indigenous to the rainforests.[2] It has been estimated that there may be many millions of species of plants, insects and microorganisms still undiscovered in tropical rainforests. Tropical rainforests have been called the \"jewels of the Earth\" and the \"world's largest pharmacy\", because over one quarter of natural medicines have been discovered there.[3] Rainforests are also responsible for 28% of the world's oxygen turnover, sometimes misnamed oxygen production,[4] processing it through photosynthesis from carbon dioxide and consuming it through respiration. The undergrowth in some areas of a rainforest can be restricted by poor penetration of sunlight to ground level. If the leaf canopy is destroyed or thinned, the ground beneath is soon colonized by a dense, tangled growth of vines, shrubs and small trees, called a jungle. The term jungle is also sometimes applied to tropical rainforests generally.", "utf-8") data1 = unicode("Tropical rainforests are characterized by a warm and wet climate with no substantial dry season: typically found within 10 degrees north and south of the equator. Mean monthly temperatures exceed 18 °C (64 °F) during all months of the year.[5] Average annual rainfall is no less than 168 cm (66 in) and can exceed 1,000 cm (390 in) although it typically lies between 175 cm (69 in) and 200 cm (79 in).[6] Many of the world's tropical forests are associated with the location of the monsoon trough, also known as the intertropical convergence zone.[7] The broader category of tropical moist forests are located in the equatorial zone between the Tropic of Cancer and Tropic of Capricorn. Tropical rainforests exist in Southeast Asia (from Myanmar (Burma) to the Philippines, Malaysia, Indonesia, Papua New Guinea, Sri Lanka, Sub-Saharan Africa from Cameroon to the Congo (Congo Rainforest), South America (e.g. the Amazon Rainforest), Central America (e.g. Bosawás, southern Yucatán Peninsula-El Peten-Belize-Calakmul), Australia, and on many of the Pacific Islands (such as Hawaiʻi). Tropical forests have been called the \"Earth's lungs\", although it is now known that rainforests contribute little net oxygen addition to the atmosphere through photosynthesis", "utf-8") data2 = unicode("Tropical forests cover a large part of the globe, but temperate rainforests only occur in few regions around the world. Temperate rainforests are rainforests in temperate regions. They occur in North America (in the Pacific Northwest in Alaska, British Columbia, Washington, Oregon and California), in Europe (parts of the British Isles such as the coastal areas of Ireland and Scotland, southern Norway, parts of the western Balkans along the Adriatic coast, as well as in Galicia and coastal areas of the eastern Black Sea, including Georgia and coastal Turkey), in East Asia (in southern China, Highlands of Taiwan, much of Japan and Korea, and on Sakhalin Island and the adjacent Russian Far East coast), in South America (southern Chile) and also in Australia and New Zealand.[10]", "utf-8")

=========== start data_0 ============== (0, ((u'the',), 13)) (1, ((u'of',), 10)) (2, ((u'rainforests',), 7)) (3, ((u'and',), 7)) (4, ((u'tropical',), 5)) (5, ((u'to',), 4)) (6, ((u'rainforest',), 4)) (7, ((u'in',), 4)) (8, ((u'are',), 4)) (9, ((u'a',), 4)) (10, ((u'it',), 3)) (11, ((u'by',), 3)) (12, ((u'been',), 3)) (13, ((u's',), 3)) (14, ((u'is',), 3)) (15, ((u'there',), 3)) (16, ((u'have',), 2)) (17, ((u'earth',), 2)) (18, ((u'sometimes',), 2)) (19, ((u'also',), 2)) (20, ((u'oxygen',), 2)) (21, ((u'jungle',), 2)) (22, ((u'rainfall',), 2)) (23, ((u'for',), 2)) (24, ((u'through',), 2)) (25, ((u'called',), 2)) (26, ((u'be',), 2)) (27, ((u'world',), 2)) (28, ((u'species',), 2)) (29, ((u'ground',), 2)) (30, ((u'shrubs',), 1)) (31, ((u'may',), 1)) (32, ((u'biotic',), 1)) (33, ((u'from',), 1)) (34, ((u'respiration',), 1)) (35, ((u'known',), 1)) (36, ((u'largest',), 1)) (37, ((u'discovered',), 1)) (38, ((u'two',), 1)) (39, ((u'plants',), 1)) (40, ((u'conditions',), 1)) (41, ((u'insects',), 1)) (42, ((u'necessary',), 1)) (43, ((u'1',), 1)) (44, ((u'convergence',), 1)) (45, ((u'jewels',), 1)) (46, ((u'poor',), 1)) (47, ((u'estimated',), 1)) (48, ((u'if',), 1)) (49, ((u'creating',), 1)) (50, ((u'that',), 1)) (51, ((u'75',), 1)) (52, ((u'growth',), 1)) (53, ((u'penetration',), 1)) (54, ((u'thinned',), 1)) (55, ((u'has',), 1)) (56, ((u'characterized',), 1)) (57, ((u'plays',), 1)) (58, ((u'temperate',), 1)) (59, ((u'production',), 1)) (60, ((u'because',), 1)) (61, ((u'high',), 1)) (62, ((u'98',), 1)) (63, ((u'trough',), 1)) (64, ((u'centimetres',), 1)) (65, ((u'over',), 1)) (66, ((u'some',), 1)) (67, ((u'undiscovered',), 1)) (68, ((u'natural',), 1)) (69, ((u'still',), 1)) (70, ((u'misnamed',), 1)) (71, ((u'all',), 1)) (72, ((u'many',), 1)) (73, ((u'sunlight',), 1)) (74, ((u'millions',), 1)) (75, ((u'dioxide',), 1)) (76, ((u'around',), 1)) (77, ((u'28',), 1)) (78, ((u'monsoon',), 1)) (79, ((u'canopy',), 1)) (80, ((u'photosynthesis',), 1)) (81, ((u'level',), 1)) (82, ((u'177',), 1)) (83, ((u'trees',), 1)) (84, ((u'carbon',), 1)) (85, ((u'one',), 1)) (86, ((u'4',), 1)) (87, ((u'between',), 1)) (88, ((u'areas',), 1)) (89, ((u'responsible',), 1)) (90, ((u'as',), 1)) (91, ((u'vines',), 1)) (92, ((u'450',), 1)) (93, ((u'turnover',), 1)) (94, ((u'leaf',), 1)) (95, ((u'role',), 1)) (96, ((u'indigenous',), 1)) (97, ((u'can',), 1)) (98, ((u'with',), 1)) (99, ((u'types',), 1)) (100, ((u'alternatively',), 1)) (101, ((u'annual',), 1)) (102, ((u'generally',), 1)) (103, ((u'zone',), 1)) (104, ((u'beneath',), 1)) (105, ((u'significant',), 1)) (106, ((u'consuming',), 1)) (107, ((u'microorganisms',), 1)) (108, ((u'applied',), 1)) (109, ((u'soon',), 1)) (110, ((u'2',), 1)) (111, ((u'tangled',), 1)) (112, ((u'250',), 1)) (113, ((u'restricted',), 1)) (114, ((u'undergrowth',), 1)) (115, ((u'medicines',), 1)) (116, ((u'climatic',), 1)) (117, ((u'colonized',), 1)) (118, ((u'forests',), 1)) (119, ((u'dense',), 1)) (120, ((u'pharmacy',), 1)) (121, ((u'quarter',), 1)) (122, ((u'intertropical',), 1)) (123, ((u'term',), 1)) (124, ((u'or',), 1)) (125, ((u'destroyed',), 1)) (126, ((u'processing',), 1)) (127, ((u'3',), 1)) (128, ((u'small',), 1)) (129, ((u'40',), 1)) =========== start data_1 ============== (0, ((u'the',), 15)) (1, ((u'of',), 8)) (2, ((u'in',), 6)) (3, ((u'and',), 6)) (4, ((u'tropical',), 5)) (5, ((u'cm',), 4)) (6, ((u'to',), 3)) (7, ((u'are',), 3)) (8, ((u'rainforests',), 3)) (9, ((u'forests',), 3)) (10, ((u'south',), 2)) (11, ((u'from',), 2)) (12, ((u'it',), 2)) (13, ((u'g',), 2)) (14, ((u'no',), 2)) (15, ((u'known',), 2)) (16, ((u'rainforest',), 2)) (17, ((u'exceed',), 2)) (18, ((u'although',), 2)) (19, ((u'typically',), 2)) (20, ((u'america',), 2)) (21, ((u'e',), 2)) (22, ((u'many',), 2)) (23, ((u's',), 2)) (24, ((u'between',), 2)) (25, ((u'as',), 2)) (26, ((u'is',), 2)) (27, ((u'with',), 2)) (28, ((u'zone',), 2)) (29, ((u'congo',), 2)) (30, ((u'tropic',), 2)) (31, ((u'equatorial',), 1)) (32, ((u'within',), 1)) (33, ((u'located',), 1)) (34, ((u'convergence',), 1)) (35, ((u'now',), 1)) (36, ((u'el',), 1)) (37, ((u'by',), 1)) (38, ((u'saharan',), 1)) (39, ((u'average',), 1)) (40, ((u'lungs',), 1)) (41, ((u'less',), 1)) (42, ((u'64',), 1)) (43, ((u'have',), 1)) (44, ((u'degreef',), 1)) (45, ((u'temperatures',), 1)) (46, ((u'1',), 1)) (47, ((u'africa',), 1)) (48, ((u'earth',), 1)) (49, ((u'200',), 1)) (50, ((u'australia',), 1)) (51, ((u'18',), 1)) (52, ((u'peninsula',), 1)) (53, ((u'indonesia',), 1)) (54, ((u'that',), 1)) (55, ((u'390',), 1)) (56, ((u'been',), 1)) (57, ((u'10',), 1)) (58, ((u'characterized',), 1)) (59, ((u'also',), 1)) (60, ((u'yucatan',), 1)) (61, ((u'6',), 1)) (62, ((u'such',), 1)) (63, ((u'months',), 1)) (64, ((u'000',), 1)) (65, ((u'islands',), 1)) (66, ((u'trough',), 1)) (67, ((u'dry',), 1)) (68, ((u'66',), 1)) (69, ((u'equator',), 1)) (70, ((u'season',), 1)) (71, ((u'mean',), 1)) (72, ((u'sub',), 1)) (73, ((u'oxygen',), 1)) (74, ((u'degrees',), 1)) (75, ((u'7',), 1)) (76, ((u'rainfall',), 1)) (77, ((u'lanka',), 1)) (78, ((u'all',), 1)) (79, ((u'monthly',), 1)) (80, ((u'cancer',), 1)) (81, ((u'monsoon',), 1)) (82, ((u'asia',), 1)) (83, ((u'on',), 1)) (84, ((u'photosynthesis',), 1)) (85, ((u'degreec',), 1)) (86, ((u'southern',), 1)) (87, ((u'location',), 1)) (88, ((u'addition',), 1)) (89, ((u'sri',), 1)) (90, ((u'capricorn',), 1)) (91, ((u'southeast',), 1)) (92, ((u'warm',), 1)) (93, ((u'found',), 1)) (94, ((u'through',), 1)) (95, ((u'cameroon',), 1)) (96, ((u'climate',), 1)) (97, ((u'called',), 1)) (98, ((u'bosawas',), 1)) (99, ((u'pacific',), 1)) (100, ((u'69',), 1)) (101, ((u'5',), 1)) (102, ((u'can',), 1)) (103, ((u'burma',), 1)) (104, ((u'79',), 1)) (105, ((u'papua',), 1)) (106, ((u'annual',), 1)) (107, ((u'lies',), 1)) (108, ((u'atmosphere',), 1)) (109, ((u'substantial',), 1)) (110, ((u'new',), 1)) (111, ((u'168',), 1)) (112, ((u'category',), 1)) (113, ((u'moist',), 1)) (114, ((u'year',), 1)) (115, ((u'little',), 1)) (116, ((u'contribute',), 1)) (117, ((u'during',), 1)) (118, ((u'175',), 1)) (119, ((u'belize',), 1)) (120, ((u'wet',), 1)) (121, ((u'than',), 1)) (122, ((u'guinea',), 1)) (123, ((u'north',), 1)) (124, ((u'philippines',), 1)) (125, ((u'hawai\u02bbi',), 1)) (126, ((u'myanmar',), 1)) (127, ((u'world',), 1)) (128, ((u'peten',), 1)) (129, ((u'exist',), 1)) (130, ((u'net',), 1)) (131, ((u'a',), 1)) (132, ((u'broader',), 1)) (133, ((u'intertropical',), 1)) (134, ((u'calakmul',), 1)) (135, ((u'central',), 1)) (136, ((u'associated',), 1)) (137, ((u'malaysia',), 1)) (138, ((u'amazon',), 1)) =========== start data_2 ============== (0, ((u'in',), 11)) (1, ((u'the',), 9)) (2, ((u'and',), 9)) (3, ((u'of',), 7)) (4, ((u'temperate',), 3)) (5, ((u'southern',), 3)) (6, ((u'as',), 3)) (7, ((u'coastal',), 3)) (8, ((u'rainforests',), 3)) (9, ((u'east',), 2)) (10, ((u'parts',), 2)) (11, ((u'america',), 2)) (12, ((u'areas',), 2)) (13, ((u'british',), 2)) (14, ((u'coast',), 2)) (15, ((u'occur',), 2)) (16, ((u'regions',), 2)) (17, ((u'are',), 1)) (18, ((u'turkey',), 1)) (19, ((u'they',), 1)) (20, ((u'on',), 1)) (21, ((u'australia',), 1)) (22, ((u'far',), 1)) (23, ((u'oregon',), 1)) (24, ((u'galicia',), 1)) (25, ((u'chile',), 1)) (26, ((u'island',), 1)) (27, ((u'few',), 1)) (28, ((u'zealand',), 1)) (29, ((u'columbia',), 1)) (30, ((u'but',), 1)) (31, ((u'world',), 1)) (32, ((u'sea',), 1)) (33, ((u'taiwan',), 1)) (34, ((u'northwest',), 1)) (35, ((u'europe',), 1)) (36, ((u'10',), 1)) (37, ((u'much',), 1)) (38, ((u'also',), 1)) (39, ((u'north',), 1)) (40, ((u'adriatic',), 1)) (41, ((u'such',), 1)) (42, ((u'cover',), 1)) (43, ((u'forests',), 1)) (44, ((u'part',), 1)) (45, ((u'including',), 1)) (46, ((u'western',), 1)) (47, ((u'a',), 1)) (48, ((u'norway',), 1)) (49, ((u'large',), 1)) (50, ((u'georgia',), 1)) (51, ((u'well',), 1)) (52, ((u'south',), 1)) (53, ((u'globe',), 1)) (54, ((u'tropical',), 1)) (55, ((u'adjacent',), 1)) (56, ((u'washington',), 1)) (57, ((u'only',), 1)) (58, ((u'russian',), 1)) (59, ((u'pacific',), 1)) (60, ((u'japan',), 1)) (61, ((u'black',), 1)) (62, ((u'along',), 1)) (63, ((u'highlands',), 1)) (64, ((u'ireland',), 1)) (65, ((u'sakhalin',), 1)) (66, ((u'balkans',), 1)) (67, ((u'korea',), 1)) (68, ((u'asia',), 1)) (69, ((u'around',), 1)) (70, ((u'scotland',), 1)) (71, ((u'eastern',), 1)) (72, ((u'alaska',), 1)) (73, ((u'china',), 1)) (74, ((u'isles',), 1)) (75, ((u'new',), 1)) (76, ((u'california',), 1))

3条回答

网友
1楼 · 编辑于 2024-05-23 14:59:08

找到多个列表交集的最简单方法是使用列表切片功能和set.intersection()。例如：
my_list =[ ['cat', 'dog', 'fan'], ['cat', 'dog', 'pine'], ['cat', 'light', 'tree', 'dog'], ['dog', 'pine', 'cat', 'tree'], ['fan', 'pine', 'dog', 'tree', 'cat'], ['light', 'dog', 'pine', 'cat', 'tree']]
则所有列表的交集可计算为：
# v Unwrapped list from index '1' set(my_list[0]).intersection(*my_list[1:]) # ^ First element in list
它将返回：
set(['dog', 'cat'])
编辑：看起来您不需要交叉点。您需要根据语句在所有列表中查找项的计数：
I'd like to be able to find out of say the word cat appears in 0,1, 2, ..N lists.
如果您只关心项的count，则可以使用^{}和^{}作为：
from itertools import chain from collections import Counter my_count = Counter(chain(*my_list))
其中my_count将容纳：
{'dog': 6, 'cat': 6, 'tree': 4, 'pine': 4, 'light': 2, 'fan': 2}
如果您还希望项与其列表的映射，您可以创建dict来映射项。但是，首先你需要把所有的项目合并为：
all_items = set(my_list[0]).union(*my_list[1:]) # which will hold: set(['light', 'tree', 'dog', 'pine', 'cat', 'fan'])
然后将其存储在dict。我使用^{}是为了方便：
from collections import defaultdict my_dict = defaultdict(list) for item in all_items: for sub_list in my_list: my_dict[item].append(item in sub_list)
现在my_dict将保存值：
{ 'light': [False, False, True, False, False, True], # ^ ^ Present in list 3 # ^ Not present in list 1 'tree': [False, False, True, True, True, True], 'dog': [True, True, True, True, True, True], 'pine': [False, True, False, True, True, True], 'cat': [True, True, True, True, True, True], 'fan': [True, False, False, False, True, False] }
你可以从这篇文章中找到发生次数

网友
2楼 · 编辑于 2024-05-23 14:59:08

尽管你的大部分问题都是关于集合的交集，但你真正想要的似乎与这个概念没有直接关系：
I'd like to be able to find out of say the word cat appears in 0,1, 2, ..N lists.
您可以找到这一点，而不必考虑交叉点、集合等：
one = ['cat', 'dog', 'pine'] two = ['cat', 'fan', 'pine'] three = ['cat', 'pine', 'tree'] four = ['dog', 'pine', 'tree'] five = ['fan', 'pine', 'tree'] six = ['light', 'pine', 'tree']
>>> sum(True for s in (one, two, three, four, five, six) if 'cat' in s) 3 >>> sum(True for s in (one, two, three, four, five, six) if 'tree' in s) 4
这是因为True在算术中使用时，其行为类似于整数1（以sum()为基础）。你知道吗
如果你真正想要的是所有“集合”的交集，那也很简单：
>>> set.intersection(*(set(s) for s in (one, two, three, four, five, six))) {'pine'}
更新：既然你已经澄清了你的问题，很明显你实际上需要计算一个单词在不同的列表中出现的次数。除了上面描述的计算单个单词出现次数的方法之外，正如我在my commenton Andrea Reina's answer（以及随后添加到his own answer的Moinuddin quadra）中提到的，在Python中这样做的惯用方法是使用^{}和^{}：
>>> from collections import Counter >>> from itertools import chain >>> counts = Counter(chain(one, two, three, four, five, six)) >>> counts Counter({'pine': 6, 'tree': 4, 'cat': 3, 'dog': 2, 'fan': 2, 'light': 1}) >>> counts['cat'] 3

网友
3楼 · 编辑于 2024-05-23 14:59:08

您可以使用^{}（^{}，如果您使用的是python3.x）

>>> from functools import reduce  # for python 3.x
>>> animals_list = [
...     ['cat', 'dog', 'pine', 'tree', 'light', 'fan'],
...     ['cat', 'pine', 'tree', 'light', 'fan'],
...     ['cat', 'dog', 'pine', 'light', 'fan'],
...     ['cat', 'dog', 'pine', 'tree', 'fan'],
... ]
>>> reduce(lambda x, y: set(x).intersection(y), animals_list)
{'pine', 'fan', 'cat'}

相关问题更多 >

编程相关推荐

热门问题

热门文章