提高嵌套循环性能
我正在用Python做一个氩气液体的分子动力学模拟。目前有一个稳定的版本在运行,但当原子数量超过100个时,运行速度就变得很慢。我发现瓶颈在于下面这个嵌套的for循环。这是一个放在函数里的力计算,这个函数是从我的main.py脚本中调用的:
def computeForce(currentPositions):
potentialEnergy = 0
force = zeros((NUMBER_PARTICLES,3))
for iParticle in range(0,NUMBER_PARTICLES-1):
for jParticle in range(iParticle + 1, NUMBER_PARTICLES):
distance = currentPositions[iParticle] - currentPositions[jParticle]
distance = distance - BOX_LENGTH * (distance/BOX_LENGTH).round()
#note: this is so much faster than scipy.dot()
distanceSquared = distance[0]*distance[0] + distance[1]*distance[1] + distance[2]*distance[2]
if distanceSquared < CUT_OFF_RADIUS_SQUARED:
r2i = 1. / distanceSquared
r6i = r2i*r2i*r2i
lennardJones = 48. * r2i * r6i * (r6i - 0.5)
force[iParticle] += lennardJones*distance
force[jParticle] -= lennardJones*distance
potentialEnergy += 4.* r6i * (r6i - 1.) - CUT_OFF_ENERGY
return(force,potentialEnergy)
大写字母的变量是常量,定义在config.py文件中。“currentPositions”是一个3行,粒子数量列的矩阵。
我已经用scipy.weave实现了一个版本的嵌套for循环,这个灵感来自这个网站:http://www.scipy.org/PerformancePython。
不过,我不喜欢这样做失去了灵活性。我想要“向量化”这个for循环,但我对这个概念不是很理解。有没有人能给我一些提示或者推荐一个好的教程来教我这个?
3 个回答
为了让这个帖子更完整,我把我用C语言实现的代码也加上了。请注意,你需要导入weave和转换器才能让它正常工作。此外,目前weave只支持Python 2.7。再次感谢大家的帮助!这个版本的运行速度比向量化的版本快了10倍。
from scipy import weave
from scipy.weave import converters
def computeForceC(currentPositions):
code = """
using namespace blitz;
Array<double,1> distance(3);
double distanceSquared, r2i, r6i, lennardJones;
double potentialEnergy = 0.;
for( int iParticle = 0; iParticle < (NUMBER_PARTICLES - 1); iParticle++){
for( int jParticle = iParticle + 1; jParticle < NUMBER_PARTICLES; jParticle++){
distance(0) = currentPositions(iParticle,0)-currentPositions(jParticle,0);
distance(0) = distance(0) - BOX_LENGTH * round(distance(0)/BOX_LENGTH);
distance(1) = currentPositions(iParticle,1)-currentPositions(jParticle,1);
distance(1) = distance(1) - BOX_LENGTH * round(distance(1)/BOX_LENGTH);
distance(2) = currentPositions(iParticle,2)-currentPositions(jParticle,2);
distance(2) = distance(2) - BOX_LENGTH * round(distance(2)/BOX_LENGTH);
distanceSquared = distance(0)*distance(0) + distance(1)*distance(1) + distance(2)*distance(2);
if(distanceSquared < CUT_OFF_RADIUS_SQUARED){
r2i = 1./distanceSquared;
r6i = r2i * r2i * r2i;
lennardJones = 48. * r2i * r6i * (r6i - 0.5);
force(iParticle,0) += lennardJones*distance(0);
force(iParticle,1) += lennardJones*distance(1);
force(iParticle,2) += lennardJones*distance(2);
force(jParticle,0) -= lennardJones*distance(0);
force(jParticle,1) -= lennardJones*distance(1);
force(jParticle,2) -= lennardJones*distance(2);
potentialEnergy += 4.* r6i * (r6i - 1.)-CUT_OFF_ENERGY;
}
}//end inner for loop
}//end outer for loop
return_val = potentialEnergy;
"""
#args that are passed into weave.inline and created inside computeForce
#potentialEnergy = 0.
force = zeros((NUMBER_PARTICLES,3))
#all args
arguments = ['currentPositions','force','NUMBER_PARTICLES','CUT_OFF_RADIUS_SQUARED','BOX_LENGTH','CUT_OFF_ENERGY']
#evaluate stuff in code
potentialEnergy = weave.inline(code,arguments,type_converters = converters.blitz,compiler = 'gcc')
return force, potentialEnergy
用纯Python写一个分子动力学引擎会比较慢。我建议你可以看看Numba(http://numba.pydata.org/)或者Cython(http://cython.org/)。在Cython方面,我写过一个简单的Langevin动力学引擎,可能可以作为一个入门的例子:
https://bitbucket.org/joshua.adelman/pylangevin-integrator
另一个我非常喜欢的选择是使用OpenMM。它有一个Python的封装,可以让你把分子动力学引擎的各个部分组合在一起,实施自定义的力等等。它还可以针对GPU设备进行优化:
不过一般来说,有很多经过高度优化的分子动力学代码可供使用,除非你是出于某种教育目的,否则从头开始自己写一个并不太有意义。以下是一些主要的代码,举几个例子:
- NAMD - http://www.ks.uiuc.edu/Research/namd/(免费)
- Gromacs - http://www.gromacs.org/(免费)
- Amber - http://ambermd.org/
- LAMMPS - http://lammps.sandia.gov/(免费)
- HOOMD - http://codeblue.umich.edu/hoomd-blue/(免费)
- OpenMM - https://simtk.org/home/openmm(免费)
- Charmm - http://www.charmm.org/
下面是我对你代码的向量化版本。对于一个有1000个数据点的数据集,我的代码大约比原来的快50倍:
In [89]: xyz = 30 * np.random.uniform(size=(1000, 3))
In [90]: %timeit a0, b0 = computeForce(xyz)
1 loops, best of 3: 7.61 s per loop
In [91]: %timeit a, b = computeForceVector(xyz)
10 loops, best of 3: 139 ms per loop
代码如下:
from numpy import zeros
NUMBER_PARTICLES = 1000
BOX_LENGTH = 100
CUT_OFF_ENERGY = 1
CUT_OFF_RADIUS_SQUARED = 100
def computeForceVector(currentPositions):
potentialEnergy = 0
force = zeros((NUMBER_PARTICLES, 3))
for iParticle in range(0, NUMBER_PARTICLES - 1):
positionsJ = currentPositions[iParticle + 1:, :]
distance = currentPositions[iParticle, :] - positionsJ
distance = distance - BOX_LENGTH * (distance / BOX_LENGTH).round()
distanceSquared = (distance**2).sum(axis=1)
ind = distanceSquared < CUT_OFF_RADIUS_SQUARED
if ind.any():
r2i = 1. / distanceSquared[ind]
r6i = r2i * r2i * r2i
lennardJones = 48. * r2i * r6i * (r6i - 0.5)
ljdist = lennardJones[:, None] * distance[ind, :]
force[iParticle, :] += (ljdist).sum(axis=0)
force[iParticle+1:, :][ind, :] -= ljdist
potentialEnergy += (4.* r6i * (r6i - 1.) - CUT_OFF_ENERGY).sum()
return (force, potentialEnergy)
我还检查过,确保这个代码的结果和原来的是一致的。