实现真正的并行python-CExtension

2024-04-20 02:00:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我已经编写了一个python C扩展。工作正常。但是现在为了更有效地执行,我需要编写同一扩展的多线程/并行执行版本。你知道吗

您能告诉我,如何编写一个同时在多个内核上运行的Python C扩展代码吗。你知道吗

我在这里已经罢工一天多了。请帮忙。你知道吗


Tags: 代码版本内核
1条回答
网友
1楼 · 发布于 2024-04-20 02:00:03

也许太迟了,但希望能帮助别人:)

并行执行C扩展的最简单方法是使用OPENMPAPI。从wikipedia

OpenMP (Open Multi-Processing) is an application programming interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran, on most platforms, processor architectures and operating systems.

例如,请参见这部分代码:

int i;
for (i=0;i<10;i++)
{
    printf("%d ",i);
}

结果:

0 1 2 3 4 5 6 7 8 9

我们可以在for语句块之前使用#pragma omp parallel for编译器指令使其并行:

int i;
#pragma omp parallel for
for (i=0;i<10;i++)
{
    printf("%d ",i);
}

结果:

0 1 5 8 9 2 6 4 3 7

要在gcc中启用openmp,您需要指定-fopenmp编译时标志。示例:

gcc -fPIC -Wall -O3 costFunction.c -o costFunction.so -shared -fopenmp

您可以从HERE学习openmp。你知道吗

the are other ways like pthread but it is very low-level.

OpenMP与PThread之比较: 例子从HERE写在C++中。你知道吗

< P><强>序列C++代码:<强>

void sum_st(int *A, int *B, int *C){
   int end = 10000000;
   for(int i = 0; i < end; i++)
    A[i] = B[i] + C[i];
}

p线程解决方案:

 struct params {
  int *A;
  int *B;
  int *C;
  int tid;
  int size;
  int nthreads;
};

void *compute_parallel(void *_p){
  params *p      = (params*) _p;
  int tid        = p->tid;
  int chunk_size = (p->size / p->nthreads);
  int start      = tid * chunk_size;
  int end        = start + chunk_size;
  for(int i = start; i < end; i++)     p->A[i] = p->B[i] + p->C[i];
  return 0;
}

void sum_mt(int *A, int *B, int *C){
  int nthreads = 4;
  int size = 10000000;
  pthread_t threads[nthreads]; //array to hold thread information
  params *thread_params = (params*) malloc(nthreads * sizeof(params));

  for(int i = 0; i < nthreads; i++){
    thread_params[i].A        = A;
    thread_params[i].B        = B;
    thread_params[i].C        = C;
    thread_params[i].tid      = i;
    thread_params[i].size     = size;
    thread_params[i].nthreads = nthreads;
    pthread_create(&threads[i], NULL, compute_parallel, (void*) &thread_params[i]);
  }

  for(int i = 0; i < nthreads; i++){
    pthread_join(threads[i], NULL);
  }
  free(thread_params);

}

OpenMP解决方案:

#pragma omp parallel for
for(int i = 0; i < 10000000; i++)
  A[i] = B[i] + C[i];

相关问题 更多 >