<h2>快点?你知道吗</h2>
<p>由于Kasramvd提出了一种聪明的方法,Wesley的马力框架也被提出了,所以让我来设定一个定量的尺度,使之能够处理单个的解决方案。你知道吗</p>
<h2>让我们既公平又量化:</h2>
<p>更多的,一旦10E+5项目在游戏中。处理的效率和速度、内存处理、矢量化潜在的和(可能)隐藏的不良副作用、CPU缓存延迟掩蔽较差或较好的非CPU数据访问时间(以及更多的troll)-<strong>这些都是敌人,我们必须在生产中生活:</strong></p>
<hr/>
<h2>总结:</h2>
<p>Nathan基于<code>set</code>的方法对于所有被测试的量表来说都要快得多。你知道吗</p>
<p>Nathan的方法处理小的,<code>1E+4</code>和<code>1E+6</code>缩放集的速度要快得多,它利用了隐藏在<strong><code>set</code></strong>-s中独特元素的python哈希集合中的有利搜索能力(正如<code>set</code>类型正是为其引入的)。你知道吗</p>
<p>然而,<strong><code>O( m*n )</code></strong>/<strong><code>O( n^2 )</code></strong>不能被证明。你知道吗</p>
<p>这些假定的复杂度模型应该意味着,随着<code>m</code>、<code>n</code>规模的增长,基于<code>numpy</code>的方法与基于相同<code>list</code>/<code>set</code>数据集的基于<code>set</code>的方法相比,对更大的<code>m</code>、<code>n</code>数据集的不利性能惩罚将增长并加速</p>
<p>随着<code>set</code>尺度的增大,初始边缘在更大尺度上变得更小。你知道吗</p>
<p><strong>1:3</strong>速度优势在<strong><code>1E+4</code></strong>尺度上有所下降,但部分<br/><strong>1:2</strong>速度优势在<strong><code>1E+6</code></strong>尺度上较小。你知道吗</p>
<p><strong>实际的代码执行使得任务O(m*n)/O(n^2)-复杂性的先验假设无法在体内得到证实。</strong></p>
<hr/>
<h2>如何测试?你知道吗</h2>
<pre><code>>>> import numpy as np
>>> from zmq import Stopwatch
>>> aStopWATCH = Stopwatch()
>>> l = [1, 1 , 1, 2, 3, 4, 5, 5]
>>> a = (1, 3)
>>> b = (3, 5)
>>> npL = np.array( l )
>>> npL
array([1, 1, 1, 2, 3, 4, 5, 5])
>>> npA = np.array( a )
>>> npA
array([1, 3])
>>> import numba # ______________________________was not me who has said QUICKLY
>>> @numba.jit # QUICKLY
... def getLinA( aListAsNumpyARRAY, aSetAsNumpyARRAY ): # << Kasramvd
... return aListAsNumpyARRAY[ np.in1d( aListAsNumpyARRAY,
aSetAsNumpyARRAY
)
]
>>>
>>> aStopWATCH.start();getLinA( npL, npA );aStopWATCH.stop()
array([1, 1, 1, 3])
113513L # runs a JIT-compiler on a first call ... pay 113,51 [ms]
>>> aStopWATCH.start();getLinA( npL, npA );aStopWATCH.stop()
array([1, 1, 1, 3])
653L # runs a pre-compiled code on 2nd+ ...... wow 0,65 [ms]
855L # runs a pre-compiled code on 2nd+ ...... wow 0,86 [ms]
857L # runs a pre-compiled code on 2nd+ ...... wow 0,86 [ms]
698L # runs a pre-compiled code on 2nd+ ...... wow 0,7 [ms]
690L # runs a pre-compiled code on 2nd+ ...... wow 0,7 [ms]
</code></pre>
<h2>提示:</h2>
<p>是的,没有免费的晚餐。<br/>
如您所见,一旦我们不得不为JIT编译付出代价。
然而,多亏了Travis OLIPHANT伟大的<strong><code>numba</code></strong>工具,没有人能阻止我们进行“pre mini call”(让JIT编译器完成它的职责)</p>
<pre><code>getLinA( npL[:2], npA[:2] )
</code></pre>
<p>下一步用已经编译好的</p>
<pre><code>getLinA( npL[:1E+9], npA[:1E+9] )
</code></pre>
<hr/>
<h2>与<code>Nathan Davis</code>交流想法引发的事后讨论</h2>
<pre><code>>>> def NathanListPROCESSOR( aList = [1,1,3,2,5,7,8,3,2,1],
setA = set( ( 1, 3 ) ),
setB = set( ( 2, 3 ) )
):
... accumulator = 0
... for item in aList:
... if item in setA:
... accumulator += 1
... elif item in setB:
... accumulator -= 1
... print accumulator
... return
...
>>> NathanListPROCESSOR() #_______________________EXPECTED: == 2
3 #_______________________O/P DEFINED: == 1
</code></pre>
<p><em>计时:请参考实际代码执行工件出现时计时结果的差异(缓存脏度的差异,)</em></p>
<pre><code>>>> aStopWATCH.start();NathanListPROCESSOR();aStopWATCH.stop()
3
1207L
491L
279L
350L
1478L
1172L
1698L
1488L
1449L
9688L
1466L
</code></pre>
<p>对于缩放到<strong><code>1E+4</code></strong>和<strong><code>1E+6</code></strong>大小的对象:</p>
<pre><code>>>> aStopWATCH.start();NathanListPROCESSOR( aLIST, setA, setB );aStopWATCH.stop()
-1
2582L
2673L
2529L
2524L
2888L
2693L
>>> aStopWATCH.start();getLISTinSET( npLIST, npSetA, npSetB );aStopWATCH.stop()
0
129983L
12068L
10699L
10930L
10857L
10999L
10954L
10994L
</code></pre>
<p>在<code>numba</code>的帮助下:</p>
<pre><code>>>> @numba.jit
... def numba_getLISTinSET( npList, npSetA, npSetB ):
... return ( len( npList[ np.in1d( npList, npSetA ) ] ) -
len( npList[ np.in1d( npList, npSetB ) ] )
)
>>> aStopWATCH.start();numba_getLISTinSET( npLIST, npSetA, npSetB );aStopWATCH.stop()
0
165320L
7047L
7328L
7378L
7898L
7519L
7556L
7277L
7296L
7292L
7303L
7302L
7426L
7369L
7307L
</code></pre>
<p>最后,<em>几乎</em>-<strong><code>1E+6</code></strong>缩放:</p>
<pre><code>>>> setA = generateSet( 1000000 )
>>> len( setA ) # the cost of uniqueness
235836
>>> setB = generateSet( 1000000 )
>>> npSetA = np.array( list( setA ), dtype = np.int )
>>> npSetB = np.array( list( setB ), dtype = np.int )
>>> aLIST = list( ( np.random.random( 1000000 * 1.1 + 10 ) * 1000000 * 10000 ).astype( np.int ) )[:1000000]
>>> len( aLIST )
1000000
>>> npLIST = np.array( aLIST, dtype = np.int )
#----------------------vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv-------------
#---------------------| |-------------
>>> aStopWATCH.start();numba_getLISTinSET( npLIST, npSetA, npSetB );aStopWATCH.stop()
6
406061L
403946L
409831L
409329L
408920L
#----------------------vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv-------------
#---------------------| |-------------
>>> aStopWATCH.start();NathanListPROCESSOR( aLIST, setA, setB );aStopWATCH.stop()
785334
200755L
196791L
195540L
196606L
202483L
197094L
199481L
196651L
200969L
198856L
202039L
200152L
202364L
</code></pre>