如何快速比较列表和集合？问题的回答

如何快速比较列表和集合？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

<h2>快点？你知道吗</h2> 由于Kasramvd提出了一种聪明的方法，Wesley的马力框架也被提出了，所以让我来设定一个定量的尺度，使之能够处理单个的解决方案。你知道吗 <h2>让我们既公平又量化：</h2> 更多的，一旦10E+5项目在游戏中。处理的效率和速度、内存处理、矢量化潜在的和（可能）隐藏的不良副作用、CPU缓存延迟掩蔽较差或较好的非CPU数据访问时间（以及更多的troll）-这些都是敌人，我们必须在生产中生活： <hr/> <h2>总结：</h2> Nathan基于<code>set</code>的方法对于所有被测试的量表来说都要快得多。你知道吗 Nathan的方法处理小的，<code>1E+4</code>和<code>1E+6</code>缩放集的速度要快得多，它利用了隐藏在<code>set</code>-s中独特元素的python哈希集合中的有利搜索能力（正如<code>set</code>类型正是为其引入的）。你知道吗 然而，<code>O( m*n )</code>/<code>O( n^2 )</code>不能被证明。你知道吗 这些假定的复杂度模型应该意味着，随着<code>m</code>、<code>n</code>规模的增长，基于<code>numpy</code>的方法与基于相同<code>list</code>/<code>set</code>数据集的基于<code>set</code>的方法相比，对更大的<code>m</code>、<code>n</code>数据集的不利性能惩罚将增长并加速 随着<code>set</code>尺度的增大，初始边缘在更大尺度上变得更小。你知道吗 1:3速度优势在<code>1E+4</code>尺度上有所下降，但部分 1:2速度优势在<code>1E+6</code>尺度上较小。你知道吗 实际的代码执行使得任务O（m*n）/O（n^2）-复杂性的先验假设无法在体内得到证实。 <hr/> <h2>如何测试？你知道吗</h2> <pre><code>>>> import numpy as np >>> from zmq import Stopwatch >>> aStopWATCH = Stopwatch() >>> l = [1, 1 , 1, 2, 3, 4, 5, 5] >>> a = (1, 3) >>> b = (3, 5) >>> npL = np.array( l ) >>> npL array([1, 1, 1, 2, 3, 4, 5, 5]) >>> npA = np.array( a ) >>> npA array([1, 3]) >>> import numba # ______________________________was not me who has said QUICKLY >>> @numba.jit # QUICKLY ... def getLinA( aListAsNumpyARRAY, aSetAsNumpyARRAY ): # << Kasramvd ... return aListAsNumpyARRAY[ np.in1d( aListAsNumpyARRAY, aSetAsNumpyARRAY ) ] >>> >>> aStopWATCH.start();getLinA( npL, npA );aStopWATCH.stop() array([1, 1, 1, 3]) 113513L # runs a JIT-compiler on a first call ... pay 113,51 [ms] >>> aStopWATCH.start();getLinA( npL, npA );aStopWATCH.stop() array([1, 1, 1, 3]) 653L # runs a pre-compiled code on 2nd+ ...... wow 0,65 [ms] 855L # runs a pre-compiled code on 2nd+ ...... wow 0,86 [ms] 857L # runs a pre-compiled code on 2nd+ ...... wow 0,86 [ms] 698L # runs a pre-compiled code on 2nd+ ...... wow 0,7 [ms] 690L # runs a pre-compiled code on 2nd+ ...... wow 0,7 [ms] </code></pre> <h2>提示：</h2> 是的，没有免费的晚餐。 如您所见，一旦我们不得不为JIT编译付出代价。然而，多亏了Travis OLIPHANT伟大的<code>numba</code>工具，没有人能阻止我们进行“pre mini call”（让JIT编译器完成它的职责） <pre><code>getLinA( npL[:2], npA[:2] ) </code></pre> 下一步用已经编译好的 <pre><code>getLinA( npL[:1E+9], npA[:1E+9] ) </code></pre> <hr/> <h2>与<code>Nathan Davis</code>交流想法引发的事后讨论</h2> <pre><code>>>> def NathanListPROCESSOR( aList = [1,1,3,2,5,7,8,3,2,1], setA = set( ( 1, 3 ) ), setB = set( ( 2, 3 ) ) ): ... accumulator = 0 ... for item in aList: ... if item in setA: ... accumulator += 1 ... elif item in setB: ... accumulator -= 1 ... print accumulator ... return ... >>> NathanListPROCESSOR() #_______________________EXPECTED: == 2 3 #_______________________O/P DEFINED: == 1 </code></pre> 计时：请参考实际代码执行工件出现时计时结果的差异（缓存脏度的差异，） <pre><code>>>> aStopWATCH.start();NathanListPROCESSOR();aStopWATCH.stop() 3 1207L 491L 279L 350L 1478L 1172L 1698L 1488L 1449L 9688L 1466L </code></pre> 对于缩放到<code>1E+4</code>和<code>1E+6</code>大小的对象： <pre><code>>>> aStopWATCH.start();NathanListPROCESSOR( aLIST, setA, setB );aStopWATCH.stop() -1 2582L 2673L 2529L 2524L 2888L 2693L >>> aStopWATCH.start();getLISTinSET( npLIST, npSetA, npSetB );aStopWATCH.stop() 0 129983L 12068L 10699L 10930L 10857L 10999L 10954L 10994L </code></pre> 在<code>numba</code>的帮助下： <pre><code>>>> @numba.jit ... def numba_getLISTinSET( npList, npSetA, npSetB ): ... return ( len( npList[ np.in1d( npList, npSetA ) ] ) - len( npList[ np.in1d( npList, npSetB ) ] ) ) >>> aStopWATCH.start();numba_getLISTinSET( npLIST, npSetA, npSetB );aStopWATCH.stop() 0 165320L 7047L 7328L 7378L 7898L 7519L 7556L 7277L 7296L 7292L 7303L 7302L 7426L 7369L 7307L </code></pre> 最后，几乎-<code>1E+6</code>缩放： <pre><code>>>> setA = generateSet( 1000000 ) >>> len( setA ) # the cost of uniqueness 235836 >>> setB = generateSet( 1000000 ) >>> npSetA = np.array( list( setA ), dtype = np.int ) >>> npSetB = np.array( list( setB ), dtype = np.int ) >>> aLIST = list( ( np.random.random( 1000000 * 1.1 + 10 ) * 1000000 * 10000 ).astype( np.int ) )[:1000000] >>> len( aLIST ) 1000000 >>> npLIST = np.array( aLIST, dtype = np.int ) #----------------------vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv------------- #---------------------| |------------- >>> aStopWATCH.start();numba_getLISTinSET( npLIST, npSetA, npSetB );aStopWATCH.stop() 6 406061L 403946L 409831L 409329L 408920L #----------------------vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv------------- #---------------------| |------------- >>> aStopWATCH.start();NathanListPROCESSOR( aLIST, setA, setB );aStopWATCH.stop() 785334 200755L 196791L 195540L 196606L 202483L 197094L 199481L 196651L 200969L 198856L 202039L 200152L 202364L </code></pre>

如何快速比较列表和集合？

1 个回答

相关Python问题