Python C扩展：多线程与随机数

Question

我在C语言中实现了一个工作队列的模式（是在一个Python扩展里），但是对性能感到很失望。

我有一个模拟程序，里面有一堆粒子（我们叫它们“元素”），我会测量完成所有计算所需的时间，并记录这个时间和参与计算的粒子数量。我是在一台四核的超线程i7处理器上运行这个代码，所以我原本期待随着线程数量增加，性能会提升（也就是所需时间会减少），大约到8个线程的时候应该是最好的。然而，结果却是最快的实现没有使用任何工作线程（函数直接执行，而不是放到队列里），而每增加一个工作线程，代码的运行速度反而越来越慢（每增加一个线程，速度下降的幅度都超过了没有线程时的运行时间！）我简单查看了一下我的处理器使用情况，发现无论运行多少线程，Python的CPU使用率都没有超过130%。而我的机器整体还有很多余量，系统的总使用率大约在200%左右。

在我的队列实现中（下面会展示），我需要从队列中随机选择一个项目，因为每个工作项的执行需要锁定两个元素，而相似的元素在队列中通常会靠得很近。因此，我希望线程能随机选择索引，去处理队列中的不同部分，以减少互斥锁的冲突。

我听说我最开始用rand()的尝试会很慢，因为我的随机数生成不是线程安全的（这句话听起来有道理吗？我不太确定…）

我尝试过用random()和drand48_r来实现（不过不幸的是，后者在OS X上似乎不可用），但统计结果都没有改善。

也许其他人能告诉我问题的原因是什么？下面是代码（工作函数），如果你觉得任何队列添加函数或构造函数也有用，请告诉我。

void* worker_thread_function(void* untyped_queue) {

  queue_t* queue = (queue_t*)untyped_queue;
  int success = 0;
  int rand_id;
  long int temp;
  work_item_t* work_to_do = NULL;
  int work_items_completed = 0;

  while (1) {
    if (pthread_mutex_lock(queue->mutex)) {

      // error case, try again:
      continue;
    }

    while (!success) {

      if (queue->queue->count == 0) {

        pthread_mutex_unlock(queue->mutex);
        break;
      }

      // choose a random item from the work queue, in order to avoid clashing element mutexes.
      rand_id = random() % queue->queue->count;

      if (!pthread_mutex_trylock(((work_item_t*)queue->queue->items[rand_id])->mutex)) {

        // obtain mutex locks on both elements for the work item.
        work_to_do = (work_item_t*)queue->queue->items[rand_id];

        if (!pthread_mutex_trylock(((element_t*)work_to_do->element_1)->mutex)){ 
          if (!pthread_mutex_trylock(((element_t*)work_to_do->element_2)->mutex)) {

            success = 1;
          } else {

            // only locked element_1 and work item:
            pthread_mutex_unlock(((element_t*)work_to_do->element_1)->mutex);
            pthread_mutex_unlock(work_to_do->mutex);
            work_to_do = NULL;
          }
        } else {

          // couldn't lock element_1, didn't even try 2:
          pthread_mutex_unlock(work_to_do->mutex);
          work_to_do = NULL;
        }
      }
    }

    if (work_to_do == NULL) {
       if (queue->queue->count == 0 && queue->exit_flag) {

        break;
      } else {

        continue;
      }
    }

    queue_remove_work_item(queue, rand_id, NULL, 1);
    pthread_mutex_unlock(work_to_do->mutex);

    pthread_mutex_unlock(queue->mutex);

    // At this point, we have mutex locks for the two elements in question, and a
    // work item no longer visible to any other threads. we have also unlocked the main
    // shared queue, and are free to perform the work on the elements.
    execute_function(
      work_to_do->interaction_function,
      (element_t*)work_to_do->element_1,
      (element_t*)work_to_do->element_2,
      (simulation_parameters_t*)work_to_do->params
    );

    // now finished, we should unlock both the elements:
    pthread_mutex_unlock(((element_t*)work_to_do->element_1)->mutex);
    pthread_mutex_unlock(((element_t*)work_to_do->element_2)->mutex);

    // and release the work_item RAM:
    work_item_destroy((void*)work_to_do);
    work_to_do = NULL;

    work_items_completed++;
    success = 0;
  }
  return NULL;
}

性能优化多线程线程安全 c语言互斥锁 cpu使用率随机数工作队列

Python C扩展：多线程与随机数

3 个回答

撰写回答