Python与C++间的异常速度差异

40 投票

17 回答

13234 浏览

数据工程师

提问于 2025-04-15 13:34

我最近写了一个简单的算法，用来计算快乐数，使用的是python语言。这个程序允许你设置一个上限，然后它会找出所有小于这个上限的快乐数。为了比较速度，我决定把我知道的最直接的算法从python翻译成c++。

结果让我惊讶的是，c++版本的运行速度明显比python版本慢。通过对发现前10,000个快乐数的执行时间进行准确的速度测试，发现python程序平均运行时间为0.59秒，而c++版本平均需要8.5秒。

我认为这个速度差异的原因在于，我在c++版本中需要为一些计算写辅助函数（比如判断一个元素是否在列表/数组/向量中），而这些功能在python语言中已经内置了。

首先，这是否是造成如此巨大速度差异的真正原因？其次，我该如何修改c++版本，使其运行速度比python版本更快（我认为应该是这样的）？

这两段代码和速度测试的结果在这里：Python版本，C++版本。谢谢大家的帮助。

#include <iostream>
#include <vector>
#include <string>
#include <ctime>
#include <windows.h>

using namespace std;

bool inVector(int inQuestion, vector<int> known);
int sum(vector<int> given);
int pow(int given, int power);
void calcMain(int upperBound);

int main()
{
    while(true)
    {
        int upperBound;
        cout << "Pick an upper bound: ";
        cin >> upperBound;
        long start, end;
        start = GetTickCount();
        calcMain(upperBound);
        end = GetTickCount();
        double seconds = (double)(end-start) / 1000.0;
        cout << seconds << " seconds." << endl << endl;
    }
    return 0;
}

void calcMain(int upperBound)
{
    vector<int> known;
    for(int i = 0; i <= upperBound; i++)
    {
        bool next = false;
        int current = i;
        vector<int> history;
        while(!next)
        {
            char* buffer = new char[10];
            itoa(current, buffer, 10);
            string digits = buffer;
            delete buffer;
            vector<int> squares;
            for(int j = 0; j < digits.size(); j++)
            {
                char charDigit = digits[j];
                int digit = atoi(&charDigit);
                int square = pow(digit, 2);
                squares.push_back(square);
            }
            int squaresum = sum(squares);
            current = squaresum;
            if(inVector(current, history))
            {
                next = true;
                if(current == 1)
                {
                    known.push_back(i);
                    //cout << i << "\t";
                }
            }
            history.push_back(current);
        }
    }
    //cout << "\n\n";
}

bool inVector(int inQuestion, vector<int> known)
{
    for(vector<int>::iterator it = known.begin(); it != known.end(); it++)
        if(*it == inQuestion)
            return true;
    return false;
}

int sum(vector<int> given)
{
    int sum = 0;
    for(vector<int>::iterator it = given.begin(); it != given.end(); it++)
        sum += *it;
    return sum;
}

int pow(int given, int power)
{
    int original = given;
    int current = given;
    for(int i = 0; i < power-1; i++)
        current *= original;
    return current;
}

#!/usr/bin/env python

import timeit

upperBound = 0

def calcMain():
    known = []
    for i in range(0,upperBound+1):
        next = False
        current = i
        history = []
        while not next:
            digits = str(current)
            squares = [pow(int(digit), 2) for digit in digits]
            squaresum = sum(squares)
            current = squaresum
            if current in history:
                next = True
                if current == 1:
                    known.append(i)
                    ##print i, "\t",
            history.append(current)
    ##print "\nend"

while True:    
    upperBound = input("Pick an upper bound: ")
    result = timeit.Timer(calcMain).timeit(1)
    print result, "seconds.\n"

内置函数算法优化编程语言比较辅助函数速度测试快乐数 C++性能运行时间分析

17 个回答

看起来你在把向量（vector）按值传递给其他函数。这会导致程序变慢，因为在传递之前，程序会先把你的向量完整复制一遍。为了避免这种情况，应该传递一个常量引用，而不是复制一份。所以，不要这样写：

int sum(vector<int> given)

而是要这样写：

int sum(const vector<int>& given)

这样做后，你就不能再使用vector::iterator了，因为它不是常量的。你需要用vector::const_iterator来替代。

你也可以传递非常量引用，但在这种情况下，你根本不需要修改这个参数。

回答于 2025-04-15 由 Python大师

分享举报

这里有一个全新、速度更快的版本，详细内容可以查看另一个回答，所以这个回答已经不再适用。

我对你的算法进行了重写，增加了缓存功能，这样每当找到一个快乐数或不快乐数时，就会保存结果。我还尽量让代码更符合Python的风格，比如创建了两个独立的函数 digits() 和 happy()。抱歉使用了Python 3，但我也想展示一些它的实用功能。

这个版本快得多。运行时间为1.7秒，比你原来的程序快了10倍，后者需要18秒（不过，我的MacBook比较旧，速度也慢 :)）

#!/usr/bin/env python3

from timeit import Timer
from itertools import count

print_numbers = False
upperBound = 10**5  # Default value, can be overidden by user.


def digits(x:'nonnegative number') -> "yields number's digits":
    if not (x >= 0): raise ValueError('Number should be nonnegative')
    while x:
        yield x % 10
        x //= 10


def happy(number, known = {1}, happies = {1}) -> 'True/None':
    '''This function tells if the number is happy or not, caching results.

    It uses two static variables, parameters known and happies; the
    first one contains known happy and unhappy numbers; the second 
    contains only happy ones.

    If you want, you can pass your own known and happies arguments. If
    you do, you should keep the assumption commented out on the 1 line.

    '''

#        assert 1 in known and happies <= known  # <= is expensive

    if number in known:
        return number in happies

    history = set()
    while True:
        history.add(number)
        number = sum(x**2 for x in digits(number))
        if number in known or number in history:
            break

    known.update(history)
    if number in happies:
        happies.update(history)
        return True


def calcMain():
    happies = {x for x in range(upperBound) if happy(x) }
    if print_numbers:
        print(happies)


if __name__ == '__main__':
    upperBound = eval(
            input("Pick an upper bound [default {0}]: "
                    .format(upperBound)).strip()
            or repr(upperBound))
    result = Timer(calcMain).timeit(1)
    print ('This computation took {0} seconds'.format(result))

回答于 2025-04-15 由 Python大师

分享举报

160

对于10万个元素，Python代码运行了6.9秒，而C++的原始代码则超过了37秒。

我对你的代码做了一些基本的优化，结果让C++的代码速度提升了100倍，现在处理10万个元素只需0.06秒。这比原来的C++代码快了617倍。

最重要的是要在发布模式下编译代码，并开启所有优化。在调试模式下，这段代码的速度慢得多。

接下来，我会解释我做的优化。

把所有的向量声明移到循环外面，使用clear()操作来清空它们，这比每次都调用构造函数要快得多。
把调用pow(value, 2)替换成直接相乘：value * value。
我没有单独创建一个平方的向量，而是直接用一个整数来计算总和。
避免所有字符串操作，因为它们比整数操作慢得多。比如，可以通过不断除以10并取余数来计算每个数字的平方，而不是把数字转换成字符串再把每个字符转回整数。
避免所有向量的复制，首先是用引用传递代替值传递，最后完全去掉了辅助函数。
去掉了一些临时变量。
还有很多小细节我可能忘了。你可以把你的代码和我的代码并排对比，看看我具体做了哪些改动。

可能还可以通过使用预分配的数组而不是向量来进一步优化代码，但这会需要更多的工作，我就留给读者自己去尝试了。:P

这是优化后的代码：

#include <iostream>
#include <vector>
#include <string>
#include <ctime>
#include <algorithm>
#include <windows.h>

using namespace std;

void calcMain(int upperBound, vector<int>& known);

int main()
{
    while(true)
    {
        vector<int> results;
        int upperBound;
        cout << "Pick an upper bound: ";
        cin >> upperBound;
        long start, end;
        start = GetTickCount();
        calcMain(upperBound, results);
        end = GetTickCount();
        for (size_t i = 0; i < results.size(); ++i) {
            cout << results[i] << ", ";
        }
        cout << endl;
        double seconds = (double)(end-start) / 1000.0;
        cout << seconds << " seconds." << endl << endl;
    }
    return 0;
}

void calcMain(int upperBound, vector<int>& known)
{
    vector<int> history;
    for(int i = 0; i <= upperBound; i++)
    {
        int current = i;
        history.clear();
        while(true)
        {
                int temp = current;
                int sum = 0;
                while (temp > 0) {
                    sum += (temp % 10) * (temp % 10);
                    temp /= 10;
                }
                current = sum;
                if(find(history.begin(), history.end(), current) != history.end())
                {
                        if(current == 1)
                        {
                                known.push_back(i);
                        }
                        break;
                }
                history.push_back(current);
        }
    }
}

回答于 2025-04-15 由 Python大师

分享举报

Python与C++间的异常速度差异

17 个回答

撰写回答