Project Euler速度比较：C与Python、Erlang与Haskell

742 投票

17 回答

159573 浏览

提问于 2025-04-16 23:00

我从Project Euler上选了第12题作为编程练习，想看看我在C、Python、Erlang和Haskell这几种语言中的实现效果（肯定不是最优的）。为了让执行时间更长一些，我决定寻找第一个拥有超过1000个因子的三角形数字，而不是原题中提到的500个。

结果如下：

C语言：

lorenzo@enzo:~/erlang$ gcc -lm -o euler12.bin euler12.c
lorenzo@enzo:~/erlang$ time ./euler12.bin
842161320

real    0m11.074s
user    0m11.070s
sys 0m0.000s

Python：

lorenzo@enzo:~/erlang$ time ./euler12.py 
842161320

real    1m16.632s
user    1m16.370s
sys 0m0.250s

使用PyPy的Python：

lorenzo@enzo:~/Downloads/pypy-c-jit-43780-b590cf6de419-linux64/bin$ time ./pypy /home/lorenzo/erlang/euler12.py 
842161320

real    0m13.082s
user    0m13.050s
sys 0m0.020s

Erlang：

lorenzo@enzo:~/erlang$ erlc euler12.erl 
lorenzo@enzo:~/erlang$ time erl -s euler12 solve
Erlang R13B03 (erts-5.7.4) [source] [64-bit] [smp:4:4] [rq:4] [async-threads:0] [hipe] [kernel-poll:false]

Eshell V5.7.4  (abort with ^G)
1> 842161320

real    0m48.259s
user    0m48.070s
sys 0m0.020s

Haskell：

lorenzo@enzo:~/erlang$ ghc euler12.hs -o euler12.hsx
[1 of 1] Compiling Main             ( euler12.hs, euler12.o )
Linking euler12.hsx ...
lorenzo@enzo:~/erlang$ time ./euler12.hsx 
842161320

real    2m37.326s
user    2m37.240s
sys 0m0.080s

总结：

C语言：100%
Python：692%（使用PyPy时为118%）
Erlang：436%（感谢RichardC，提升到135%）
Haskell：1421%

我猜C语言有很大优势，因为它在计算时使用的是长整型，而其他三种语言使用的是任意长度的整数。而且C语言不需要先加载运行时（其他语言需要吗？）。

问题1： Erlang、Python和Haskell是否因为使用任意长度的整数而变慢？还是只要值小于MAXINT就不会？

问题2： 为什么Haskell这么慢？有没有编译器选项可以关闭限制，还是说是我实现的问题？（后者可能性很大，因为Haskell对我来说就像一本七个封印的书。）

问题3： 你能给我一些建议，如何在不改变我确定因子方式的情况下优化这些实现吗？任何形式的优化：更好看、更快、更“本地化”于语言。

编辑：

问题4： 我的函数式实现是否允许LCO（最后调用优化，也叫尾递归消除），从而避免在调用栈上添加不必要的帧？

我真的尽量在这四种语言中实现尽可能相似的算法，虽然我必须承认我对Haskell和Erlang的了解非常有限。

使用的源代码：

#include <stdio.h>
#include <math.h>

int factorCount (long n)
{
    double square = sqrt (n);
    int isquare = (int) square;
    int count = isquare == square ? -1 : 0;
    long candidate;
    for (candidate = 1; candidate <= isquare; candidate ++)
        if (0 == n % candidate) count += 2;
    return count;
}

int main ()
{
    long triangle = 1;
    int index = 1;
    while (factorCount (triangle) < 1001)
    {
        index ++;
        triangle += index;
    }
    printf ("%ld\n", triangle);
}

#! /usr/bin/env python3.2

import math

def factorCount (n):
    square = math.sqrt (n)
    isquare = int (square)
    count = -1 if isquare == square else 0
    for candidate in range (1, isquare + 1):
        if not n % candidate: count += 2
    return count

triangle = 1
index = 1
while factorCount (triangle) < 1001:
    index += 1
    triangle += index

print (triangle)

-module (euler12).
-compile (export_all).

factorCount (Number) -> factorCount (Number, math:sqrt (Number), 1, 0).

factorCount (_, Sqrt, Candidate, Count) when Candidate > Sqrt -> Count;

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

factorCount (Number, Sqrt, Candidate, Count) ->
    case Number rem Candidate of
        0 -> factorCount (Number, Sqrt, Candidate + 1, Count + 2);
        _ -> factorCount (Number, Sqrt, Candidate + 1, Count)
    end.

nextTriangle (Index, Triangle) ->
    Count = factorCount (Triangle),
    if
        Count > 1000 -> Triangle;
        true -> nextTriangle (Index + 1, Triangle + Index + 1)  
    end.

solve () ->
    io:format ("~p~n", [nextTriangle (1, 1) ] ),
    halt (0).

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' number sqrt candidate count
    | fromIntegral candidate > sqrt = count
    | number `mod` candidate == 0 = factorCount' number sqrt (candidate + 1) (count + 2)
    | otherwise = factorCount' number sqrt (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

性能比较编程语言函数式编程长整型运行时优化三角形数字因子计算尾递归消除

17 个回答

165

关于Python的优化，除了使用PyPy（它能在不改变你代码的情况下大幅提升速度），你还可以利用PyPy的翻译工具链来编译一个符合RPython标准的版本，或者使用Cython来构建一个扩展模块。这两种方法的速度都比我测试的C版本快，特别是Cython模块的速度几乎快了一倍。我还附上了C和PyPy的基准测试结果供参考：

C（使用 gcc -O3 -lm 编译）

% time ./euler12-c 
842161320

./euler12-c  11.95s 
 user 0.00s 
 system 99% 
 cpu 11.959 total

PyPy 1.5

% time pypy euler12.py
842161320
pypy euler12.py  
16.44s user 
0.01s system 
99% cpu 16.449 total

RPython（使用最新的PyPy版本，c2f583445aee）

% time ./euler12-rpython-c
842161320
./euler12-rpy-c  
10.54s user 0.00s 
system 99% 
cpu 10.540 total

Cython 0.15

% time python euler12-cython.py
842161320
python euler12-cython.py  
6.27s user 0.00s 
system 99% 
cpu 6.274 total

RPython版本有几个关键的变化。要把它转换成一个独立的程序，你需要定义你的target，在这个例子中就是main函数。这个函数需要接受sys.argv作为唯一参数，并且必须返回一个整数。你可以通过使用translate.py来进行转换，命令是% translate.py euler12-rpython.py，它会把代码转换成C并为你编译。

# euler12-rpython.py

import math, sys

def factorCount(n):
    square = math.sqrt(n)
    isquare = int(square)
    count = -1 if isquare == square else 0
    for candidate in xrange(1, isquare + 1):
        if not n % candidate: count += 2
    return count

def main(argv):
    triangle = 1
    index = 1
    while factorCount(triangle) < 1001:
        index += 1
        triangle += index
    print triangle
    return 0

if __name__ == '__main__':
    main(sys.argv)

def target(*args):
    return main, None

Cython版本被重写成一个扩展模块_euler12.pyx，我在一个普通的Python文件中导入并调用它。_euler12.pyx基本上和你的版本相同，只是增加了一些静态类型声明。setup.py包含了构建扩展所需的常规代码，使用python setup.py build_ext --inplace来构建。

# _euler12.pyx
from libc.math cimport sqrt

cdef int factorCount(int n):
    cdef int candidate, isquare, count
    cdef double square
    square = sqrt(n)
    isquare = int(square)
    count = -1 if isquare == square else 0
    for candidate in range(1, isquare + 1):
        if not n % candidate: count += 2
    return count

cpdef main():
    cdef int triangle = 1, index = 1
    while factorCount(triangle) < 1001:
        index += 1
        triangle += index
    print triangle

# euler12-cython.py
import _euler12
_euler12.main()

# setup.py
from distutils.core import setup
from distutils.extension import Extension
from Cython.Distutils import build_ext

ext_modules = [Extension("_euler12", ["_euler12.pyx"])]

setup(
  name = 'Euler12-Cython',
  cmdclass = {'build_ext': build_ext},
  ext_modules = ext_modules
)

老实说，我对RPython和Cython的经验都不多，但结果让我感到惊喜。如果你在使用CPython，把计算密集型的代码写成Cython扩展模块似乎是优化程序的一个非常简单的方法。

回答于 2025-04-16 由 Python大师

分享举报

235

关于Erlang的实现，有一些问题。作为一个基准，我测量了你未修改的Erlang程序的执行时间是47.6秒，而C代码的执行时间是12.7秒。

(编辑：截至2021年Erlang/OTP版本24，Erlang已经有了自动的JIT编译器，之前的+native编译选项不再支持或需要。我保留下面的段落作为历史文档。关于export_all的评论在JIT生成良好代码的能力上仍然有一定的有效性。)

如果你想运行计算密集型的Erlang代码，首先要做的就是使用本地代码。使用erlc +native euler12编译后，时间降到了41.3秒。不过，这个速度提升（仅15%）比预期的要低，问题出在你使用了-compile(export_all)。这个选项在实验时很有用，但因为所有函数都可能被外部调用，导致本地编译器非常保守。（普通的BEAM模拟器受影响不大。）将这个声明替换为-export([solve/0]).，可以获得更好的速度提升：31.5秒（比基准快了近35%）。

但是代码本身有个问题：在factorCount循环的每次迭代中，你都要进行这个测试：

factorCount (_, Sqrt, Candidate, Count) when Candidate == Sqrt -> Count + 1;

C代码并没有这样做。一般来说，在不同实现之间进行公平比较可能会很棘手，尤其是当算法涉及数字计算时，因为你需要确保它们实际上在做同样的事情。在某个地方由于类型转换导致的轻微舍入误差，可能会让一个实现比另一个多进行很多次迭代，尽管最终结果是一样的。

为了消除这个可能的错误来源（并去掉每次迭代中的额外测试），我将factorCount函数重写如下，紧密模仿C代码：

factorCount (N) ->
    Sqrt = math:sqrt (N),
    ISqrt = trunc(Sqrt),
    if ISqrt == Sqrt -> factorCount (N, ISqrt, 1, -1);
       true          -> factorCount (N, ISqrt, 1, 0)
    end.

factorCount (_N, ISqrt, Candidate, Count) when Candidate > ISqrt -> Count;
factorCount ( N, ISqrt, Candidate, Count) ->
    case N rem Candidate of
        0 -> factorCount (N, ISqrt, Candidate + 1, Count + 2);
        _ -> factorCount (N, ISqrt, Candidate + 1, Count)
    end.

这个重写，没有export_all，并且进行了本地编译，给我带来了以下运行时间：

$ erlc +native euler12.erl
$ time erl -noshell -s euler12 solve
842161320

real    0m19.468s
user    0m19.450s
sys 0m0.010s

与C代码相比，这个结果还不错：

$ time ./a.out 
842161320

real    0m12.755s
user    0m12.730s
sys 0m0.020s

考虑到Erlang并不特别适合编写数字代码，在这样的程序中比C慢50%其实已经相当不错了。

最后，关于你的问题：

问题1：Erlang、Python和Haskell因为使用任意长度的整数而降低速度吗？只要值小于MAXINT就不会吗？

是的，确实会有一些影响。在Erlang中，没有办法指定“使用32/64位的算术运算并进行溢出处理”，所以除非编译器能证明你的整数有某些限制（通常它做不到），否则它必须检查所有计算，以确定这些计算是否可以适应一个单独的标记字，或者是否需要将它们转换为堆分配的大整数。即使在运行时实际上从未使用大整数，这些检查仍然需要进行。另一方面，这意味着你知道算法不会因为意外的整数溢出而失败，如果你突然给它更大的输入。

问题4：我的函数式实现是否允许最后调用优化，从而避免在调用栈上添加不必要的帧？

是的，你的Erlang代码在最后调用优化方面是正确的。

回答于 2025-04-16 由 Python大师

分享举报

858

在一台运行着 GHC 7.0.3、gcc 4.4.6 和 Linux 2.6.29 的 x86_64 Core2 Duo（2.5GHz）机器上，使用 ghc -O2 -fllvm -fforce-recomp 编译 Haskell 代码，使用 gcc -O3 -lm 编译 C 代码。

你的 C 程序运行了 8.4 秒（比你的运行快，可能是因为用了 -O3 优化）
Haskell 的解决方案运行了 36 秒（因为用了 -O2 标志）
你的 factorCount' 代码没有明确类型，默认使用了 Integer（感谢 Daniel 指正我的错误！）。如果给它一个明确的类型签名（这本来就是标准做法），用 Int，时间就变成11.1 秒。
在 factorCount' 中，你不必要地调用了 fromIntegral。修复这个问题后时间没有变化（编译器很聪明，真是幸运）。
你用了 mod，但 rem 更快且足够用。这样时间变成8.5 秒。
factorCount' 一直在传递两个不变的参数（number 和 sqrt）。通过一个工人/包装器转换，我们得到了：

 $ time ./so
 842161320  

 real    0m7.954s  
 user    0m7.944s  
 sys     0m0.004s

没错，结果是7.95 秒。这比 C 的解决方案快了半秒。如果不使用 -fllvm 标志，我的时间仍然是 8.182 秒，所以 NCG 后端在这种情况下表现也不错。

结论：Haskell 真棒。

最终代码

factorCount number = factorCount' number isquare 1 0 - (fromEnum $ square == fromIntegral isquare)
    where square = sqrt $ fromIntegral number
          isquare = floor square

factorCount' :: Int -> Int -> Int -> Int -> Int
factorCount' number sqrt candidate0 count0 = go candidate0 count0
  where
  go candidate count
    | candidate > sqrt = count
    | number `rem` candidate == 0 = go (candidate + 1) (count + 2)
    | otherwise = go (candidate + 1) count

nextTriangle index triangle
    | factorCount triangle > 1000 = triangle
    | otherwise = nextTriangle (index + 1) (triangle + index + 1)

main = print $ nextTriangle 1 1

编辑：既然我们探讨了这些，接下来回答问题

问题 1：Erlang、Python 和 Haskell 使用任意长度整数会变慢吗？还是只要值小于 MAXINT 就不会？

在 Haskell 中，使用 Integer 比 Int 慢，但慢多少取决于具体的计算。幸运的是（对于 64 位机器来说），Int 是足够的。为了可移植性，你可能应该把我的代码改成使用 Int64 或 Word64（C 语言也有 long 类型）。

问题 2：为什么 Haskell 这么慢？有没有编译器标志可以关闭限制，还是说是我的实现问题？（后者可能性很大，因为 Haskell 对我来说就像一本七个封印的书。）

问题 3：你能给我一些提示，如何在不改变我确定因子的方式下优化这些实现吗？任何形式的优化：更好看、更快、更“本地化”的语言风格。

这是我之前的回答。答案是：

0) 使用 -O2 进行优化
1) 尽可能使用快速（特别是：可以解包的）类型
2) 用 rem 而不是 mod（这是一个常被遗忘的优化）
3) 工人/包装器转换（可能是最常见的优化）。

问题 4：我的函数实现是否允许 LCO，从而避免在调用栈上添加不必要的帧？

是的，这不是问题。做得好，很高兴你考虑到了这一点。

回答于 2025-04-16 由 Python大师

分享举报

Project Euler速度比较：C与Python、Erlang与Haskell

编辑：

17 个回答

撰写回答