优化多个for循环以支持多线程和/或GPU

Question

我写了这段Python代码，用于通过Shu Radcliffe方法进行四杆机构的运动学合成。你可以看到，这里有几个循环，未来可能会变成3个或4个。现在在一台有16个CPU（32个线程）的工作站上运行这段代码大约需要40分钟，而在Python运行时，CPU的使用率非常低。

我在代码中使用了numpy和math库。

我在想，是否有办法在Python中使用多个CPU和/或GPU（CUDA）来运行这些循环。

for th_12 in th_12_range:

    for th_13 in th_13_range:

        r_2=x_2-x_1*cos(th_12)+y_1*sin(th_12)
        r_3=x_3-x_1*cos(th_13)+y_1*sin(th_13)

        s_2=y_2-x_1*sin(th_12)-y_1*cos(th_12)
        s_3=y_3-x_1*sin(th_13)-y_1*cos(th_13)

        c_X_1_2_A=r_2*cos(th_12)+s_2*sin(th_12)-X_0_A*cos(th_12)\
        -Y_0_A*sin(th_12)+X_0_A
        c_Y_1_2_A=s_2*cos(th_12)-r_2*sin(th_12)+X_0_A*sin(th_12)\
        -Y_0_A*cos(th_12)+Y_0_A

        c_X_1_3_A=r_3*cos(th_13)+s_3*sin(th_13)-X_0_A*cos(th_13)\
        -Y_0_A*sin(th_13)+X_0_A
        c_Y_1_3_A=s_3*cos(th_13)-r_3*sin(th_13)+X_0_A*sin(th_13)\
        -Y_0_A*cos(th_13)+Y_0_A

        noto_2_A=r_2*X_0_A+s_2*Y_0_A-0.5*(r_2**2+s_2**2)
        noto_3_A=r_3*X_0_A+s_3*Y_0_A-0.5*(r_3**2+s_3**2)

        coeff_A = array ([[c_X_1_2_A,c_Y_1_2_A],[c_X_1_3_A,c_Y_1_3_A]])

        v_noti_A = array ([noto_2_A,noto_3_A])

        A=linalg.solve(coeff_A,v_noti_A)

        c_X_1_2_B=r_2*cos(th_12)+s_2*sin(th_12)-X_0_B\
        *cos(th_12)-Y_0_B*sin(th_12)+X_0_B

        c_Y_1_2_B=s_2*cos(th_12)-r_2*sin(th_12)+X_0_B\
        *sin(th_12)-Y_0_B*cos(th_12)+Y_0_B
        c_X_1_3_B=r_3*cos(th_13)+s_3*sin(th_13)-X_0_B*cos(th_13)\
        -Y_0_B*sin(th_13)+X_0_B
        c_Y_1_3_B=s_3*cos(th_13)-r_3*sin(th_13)+X_0_B*sin(th_13)\
        -Y_0_B*cos(th_13)+Y_0_B

        noto_2_B=r_2*X_0_B+s_2*Y_0_B-0.5*(r_2**2+s_2**2)
        noto_3_B=r_3*X_0_B+s_3*Y_0_B-0.5*(r_3**2+s_3**2)

        coeff_B = array ([[c_X_1_2_B,c_Y_1_2_B],[c_X_1_3_B,c_Y_1_3_B]])
        v_noti_B = array ([noto_2_B,noto_3_B])

        B=linalg.solve(coeff_B,v_noti_B)

        AC_i=((A[0]-x_1)**2+(A[1]-y_1)**2)**0.5
        BC_i=((B[0]-x_1)**2+(B[1]-y_1)**2)**0.5

        r1_i=((X_0_A-A[0])**2+(Y_0_A-A[1])**2)**0.5
        r2_i=((A[0]-B[0])**2+(A[1]-B[1])**2)**0.5
        r3_i=((B[0]-X_0_B)**2+(B[1]-Y_0_B)**2)**0.5
        r4_i=((X_0_A-X_0_B)**2+(Y_0_A-Y_0_B)**2)**0.5

        r=array([r1,r2,r3,r4])

        g_1=amax(r)+amin(r)
        g_2=sum(r)-g_1

        if g_1<=g_2:
            if amin(r) == (r1 or r3):
                quad_iesimo=[r1,r2,r3,r4,th_12_t,th_13_t,AC,BC]
                quad_gra.append(quad_iesimo)

代码优化多线程 numpy gpu计算运动学合成 Shu Radcliffe方法

优化多个for循环以支持多线程和/或GPU

1 个回答

撰写回答