add to set" 在Java中返回布尔值 - 那Python呢?
在Java中,我喜欢使用“添加到集合”操作返回的布尔值来检查这个元素是否已经在集合里了。
if (set.add("x")) {
print "x was not yet in the set";
}
我想问的是,Python中有没有类似方便的方法?我试了:
z = set()
if (z.add(y)):
print something
但是它什么都没有打印出来。我是不是漏掉了什么?
7 个回答
显然,set().add(elem)
这个方法总是返回 None
,而且它的类型是 NoneType
,这是根据以下内容得出的:
$ python
Python 3.10.7 (main, Sep 7 2022, 01:54:01) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> type(set().add(12345))
<class 'NoneType'>
如果你想避免重复查找,可以使用 len()
来检测元素是否被添加,通过在添加之前和之后检查集合的长度:
a_set = set()
#...
something=12345
pre_len = len(a_set)
a_set.add(something) #always returns None
if pre_len != len(a_set):
print(f"The element({something}) was added therefore it was not already in the set.")
else:
print(f"The element({something}) was not added because it was already in the set.")
我找到了一些关于 len()
的信息:
len() 是怎么工作的?
len() 的工作时间是 O(1),因为集合是一个对象,并且有一个成员来存储它的大小。以下是来自 Python 文档的 len() 描述。返回一个对象的长度(项目数量)。参数可以是一个序列(比如字符串、字节、元组、列表或范围)或一个集合(比如字典、集合或冻结集合)。
来源: https://www.geeksforgeeks.org/find-the-length-of-a-set-in-python/
我原本期待长度会以某种方式被缓存,上面的内容似乎证实了这一点。
此外(感谢在 liberachat IRC 的 #python 频道 提供的信息和确认),我自己也通过查看源代码确认了这一点,所以我现在可以肯定 len() 返回的是 缓存 的结果。
还有 这个页面 在 Python 维基上也证实了这一点:获取长度
(对于 list
,但影响所有集合)的操作时间复杂度是 O(1)
(即使在最坏情况下),而 x in s
的操作时间复杂度在 O(1)
和 O(n)
(最坏情况)之间,其中 'n' 是当前容器中的元素数量。
下面我尝试查看时间差,但我不知道自己是否做对了,可能我遗漏了什么,但 len()
的变体似乎稍微慢一些(它比 x in set
变体多花了 9.4% 或 51.7% 的时间)。无论如何,这里是:
what =========== time it took in seconds (underscores for readability/matching)
-----
init_set() highest time==========================5.374_412_298_202_514_648_438_
init_set() average time==========================5.374_412_298_202_514_648_438_
init_set() lowest time===========================5.374_412_298_202_514_648_438_
re_set() highest time============================1.283_961_534_500_122_070_312_
re_set() average time============================1.233_813_919_126_987_457_275_
re_set() lowest time=============================1.027_379_512_786_865_234_375_
use_len_and_always_add() highest time============0.000_089_168_548_583_984_375_
use_len_and_always_add() average time============0.000_002_022_699_288_719_765_*
use_len_and_always_add() lowest time=============0.000_001_668_930_053_710_938_
double_lookup_never_add() highest time===========0.000_107_288_360_595_703_125_
double_lookup_never_add() average time===========0.000_001_333_327_164_114_879_*
double_lookup_never_add() lowest time============0.000_000_953_674_316_406_250_
double_lookup_always_add() highest time==========0.000_087_261_199_951_171_875_
double_lookup_always_add() average time==========0.000_001_681_037_948_399_603_*
double_lookup_always_add() lowest time===========0.000_001_192_092_895_507_812_
double_lookup_never_add2() highest time==========0.000_120_401_382_446_289_062_
double_lookup_never_add2() average time==========0.000_001_423_196_238_256_303_*
double_lookup_never_add2() lowest time===========0.000_001_192_092_895_507_812_
using_len_many() highest time===================10.652_944_326_400_756_835_938_
using_len_many() average time===================10.642_948_746_681_213_378_906_
using_len_many() lowest time====================10.632_953_166_961_669_921_875_
double_lookup_always_add_many() highest time====10.278_126_478_195_190_429_688_
double_lookup_always_add_many() average time====10.234_028_577_804_565_429_688_
double_lookup_always_add_many() lowest time=====10.189_930_677_413_940_429_688_
double_lookup_always_add_many2() highest time===10.584_211_587_905_883_789_062_
double_lookup_always_add_many2() average time===10.565_821_886_062_622_070_312_
double_lookup_always_add_many2() lowest time====10.547_432_184_219_360_351_562_
double_lookup_many() highest time================9.843_203_306_198_120_117_188_
double_lookup_many() average time================9.723_606_467_247_009_277_344_
double_lookup_many() lowest time=================9.604_009_628_295_898_437_500_
以上是手动排序的(CPU 设置为 800Mhz,通过 sudo cpupower frequency-set --related --governor powersave --min 800MHz --max 800MHz
),但输出来自以下代码:
#!/usr/bin/python3
import time as t
WORST="() highest time"
AVG="() average time"
BEST="() lowest time"
stats: dict[str,float]=dict({})
def TimeTakenDecorator(func):
#src: https://stackoverflow.com/a/70954147/19999437
def wraper(*args,**kwargs):
global stats
tmp_avg=stats.get(func.__name__+AVG)
#too_fast:bool
if (None == tmp_avg) or (tmp_avg > 0.5):
print(f'Calling "{func.__name__}()"') #,end="")
#too_fast=False
#else:
#too_fast=True
start = t.time()
func(*args,**kwargs)
end = t.time()
diff=end - start
if None == stats.get(func.__name__+WORST):
stats[func.__name__+WORST]=diff
if None == tmp_avg:
stats[func.__name__+AVG]=diff
tmp_avg=diff
if None == stats.get(func.__name__+BEST):
stats[func.__name__+BEST]=diff
if diff > stats[func.__name__+WORST]:
stats[func.__name__+WORST]=diff
if diff < stats[func.__name__+BEST]:
stats[func.__name__+BEST]=diff
stats[func.__name__+AVG]=(tmp_avg + diff) /2
if diff > 0.5:
print(f'Elapsed time for function call "{func.__name__}": {diff:.20f}')
#print(" "+str(diff))
#if not too_fast:
# print()
return wraper
something=5_234_567
REPEATS=1_000_000
#init_set() highest time==========================5.374_412_298_202_514_648_438_
#init_set() average time==========================5.374_412_298_202_514_648_438_
#init_set() lowest time===========================5.374_412_298_202_514_648_438_
@TimeTakenDecorator
def init_set():
#print("Initializing the set:")
global g1_set
g1_set = set()
for i in range(10_000_000):
g1_set.add(i)
#re_set() highest time============================1.283_961_534_500_122_070_312_
#re_set() average time============================1.233_813_919_126_987_457_275_
#re_set() lowest time=============================1.027_379_512_786_865_234_375_
@TimeTakenDecorator
def re_set():
#print("Resetting the set:")
global g2_set
global g1_set
g2_set=g1_set.copy()
#double_lookup_many() highest time================9.843_203_306_198_120_117_188_
#double_lookup_many() average time================9.723_606_467_247_009_277_344_
#double_lookup_many() lowest time=================9.604_009_628_295_898_437_500_
@TimeTakenDecorator
def double_lookup_many():
#print("Using double lookup:")
for i in range(REPEATS):
double_lookup_never_add()
#double_lookup_always_add_many() highest time====10.278_126_478_195_190_429_688_
#double_lookup_always_add_many() average time====10.234_028_577_804_565_429_688_
#double_lookup_always_add_many() lowest time=====10.189_930_677_413_940_429_688_
@TimeTakenDecorator
def double_lookup_always_add_many():
#print("Using double lookup:")
for i in range(REPEATS):
double_lookup_always_add()
#using_len_many() highest time===================10.652_944_326_400_756_835_938_
#using_len_many() average time===================10.642_948_746_681_213_378_906_
#using_len_many() lowest time====================10.632_953_166_961_669_921_875_
@TimeTakenDecorator
def using_len_many():
#print("Using len():")
for i in range(REPEATS):
use_len_and_always_add()
#double_lookup_never_add() highest time===========0.000_107_288_360_595_703_125_
#double_lookup_never_add() average time===========0.000_001_333_327_164_114_879_
#double_lookup_never_add() lowest time============0.000_000_953_674_316_406_250_
@TimeTakenDecorator
def double_lookup_never_add():
global g2_set
if something not in g2_set:
g2_set.add(something)
#pass
#print(f"The element({something}) was added therefore it was not already in the set.")
#else:
#g2_set.add(something)
#pass
#print(f"The element({something}) was not added because it was already in the set.")
#double_lookup_always_add() highest time==========0.000_087_261_199_951_171_875_
#double_lookup_always_add() average time==========0.000_001_681_037_948_399_603_
#double_lookup_always_add() lowest time===========0.000_001_192_092_895_507_812_
@TimeTakenDecorator
def double_lookup_always_add():
global g2_set
if something not in g2_set:
g2_set.add(something)
else:
g2_set.add(something)
#use_len_and_always_add() highest time============0.000_089_168_548_583_984_375_
#use_len_and_always_add() average time============0.000_002_022_699_288_719_765_
#use_len_and_always_add() lowest time=============0.000_001_668_930_053_710_938_
@TimeTakenDecorator
def use_len_and_always_add():
global g2_set
pre_len = len(g2_set)
g2_set.add(something)
#pass
if pre_len != len(g2_set):
pass
#print(f"The element({something}) was added therefore it was not already in the set.")
#else:
# pass
#print(f"The element({something}) was not added because it was already in the set.")
#double_lookup_never_add2() highest time==========0.000_120_401_382_446_289_062_
#double_lookup_never_add2() average time==========0.000_001_423_196_238_256_303_
#double_lookup_never_add2() lowest time===========0.000_001_192_092_895_507_812_
@TimeTakenDecorator
def double_lookup_never_add2():
global g2_set
if something not in g2_set:
g2_set.add(something)
#double_lookup_always_add_many2() highest time===10.584_211_587_905_883_789_062_
#double_lookup_always_add_many2() average time===10.565_821_886_062_622_070_312_
#double_lookup_always_add_many2() lowest time====10.547_432_184_219_360_351_562_
@TimeTakenDecorator
def double_lookup_always_add_many2():
global g2_set
for i in range(REPEATS):
g2_set.clear()
double_lookup_never_add2()
def main():
init_set()
re_set()
using_len_many()
re_set()
double_lookup_many()
re_set()
double_lookup_always_add_many()
re_set()
double_lookup_always_add_many2()
print("Once more in reverse order:")
re_set()
double_lookup_always_add_many2()
re_set()
double_lookup_always_add_many()
re_set()
double_lookup_many()
re_set()
using_len_many()
import json
#from json import encoder
#encoder.FLOAT_REPR = lambda o: f'{o:.20f}' #.format(o) #format(o, '.20f')
#src: https://stackoverflow.com/a/69056325/19999437
class RoundingFloat(float):
__repr__ = staticmethod(lambda x: format(x, '.30f'))
json.encoder.c_make_encoder = None
#if hasattr(json.encoder, 'FLOAT_REPR'):
# # Python 2
# json.encoder.FLOAT_REPR = RoundingFloat.__repr__
#else:
# Python 3
json.encoder.float = RoundingFloat
print(json.dumps(stats, sort_keys=False, indent=2)) #, parse_float=lambda x: f"{x:.20f}"))
import re
for k in stats:
time_form=re.sub(r'([0-9_]+\.)?([0-9]{3})', '\\1\\2_', f"{stats[k]:_.21f}")
#print(f"{k:-<45} {stats[k]:.20f}")
print(f"{k:=<45}={time_form:=>33}")
main();
正如之前的回答所提到的,Python 中集合的 add 方法是不会返回任何东西的。顺便说一下,这个问题曾在 Python 的邮件列表上讨论过:http://mail.python.org/pipermail/python-ideas/2009-February/002877.html。
在Python中,set.add()
这个方法不会返回任何东西。如果你想检查某个元素是否在集合中,可以使用not in
这个操作符。
z = set()
if y not in z: # If the object is not in the list yet...
print something
z.add(y)
如果你真的想知道在你添加这个元素之前,它是否已经在集合里,你可以先把这个结果存储为一个布尔值(真或假)。
z = set()
was_here = y not in z
z.add(y)
if was_here: # If the object was not in the list yet...
print something
不过,我觉得你可能并不需要这样做。
这是Python的一种习惯:当一个方法更新某个对象时,它通常会返回None
。你可以选择忽略这个习惯;实际上,有些方法并不遵循这个规则。不过,这种习惯是比较常见和被认可的:我建议你遵循这个习惯,并记住这一点。