优化带状系数矩阵的 A*x = B 解法

5 投票

4 回答

15018 浏览

提问于 2025-04-18 03:02

我有一组方程，形式是 A*x = B，其中 [A] 是一个三对角矩阵。通过使用 Numpy 的求解器 numpy.linalg.solve，我可以求解这个方程组，找到 x。

下面是我如何构建三对角矩阵 [A] 和 {B} 向量，并求解 x 的示例：

# Solve system of equations with a tridiagonal coefficient matrix
# uses numpy.linalg.solve

# use Python 3 print function
from __future__ import print_function
from __future__ import division

# modules
import numpy as np
import time

ti = time.clock()

#---- Build [A] array and {B} column vector

m = 1000   # size of array, make this 8000 to see time benefits

A = np.zeros((m, m))     # pre-allocate [A] array
B = np.zeros((m, 1))     # pre-allocate {B} column vector

A[0, 0] = 1
A[0, 1] = 2
B[0, 0] = 1

for i in range(1, m-1):
    A[i, i-1] = 7   # node-1
    A[i, i] = 8     # node
    A[i, i+1] = 9   # node+1
    B[i, 0] = 2

A[m-1, m-2] = 3
A[m-1, m-1] = 4
B[m-1, 0] = 3

print('A \n', A)
print('B \n', B)

#---- Solve using numpy.linalg.solve

x = np.linalg.solve(A, B)     # solve A*x = B for x

print('x \n', x)

#---- Elapsed time for each approach

print('NUMPY time', time.clock()-ti, 'seconds')

所以我想问的有两个方面：

因为我处理的是一个三对角矩阵 [A]，也叫带状矩阵，有没有比使用 numpy.linalg.solve 更有效的方法来解决这个方程组？
还有，是否有更好的方法来创建这个三对角矩阵，而不是用 for-loop？

根据 time.clock() 函数，上面的示例在 Linux 上运行大约需要 0.08 秒。

numpy.linalg.solve 函数运行得很好，但我想找一种方法，利用 [A] 的三对角形式，希望能进一步加快求解速度，然后将这种方法应用到更复杂的例子中。

numpy 计算效率优化算法数值求解线性方程组矩阵构建三对角矩阵带状矩阵

4 个回答

这可能会对你有帮助。这里有一个叫做 creates_tridiagonal 的函数，它可以用来创建三对角矩阵。另外还有一个函数，可以把矩阵转换成对角线排序的形式，这样就能满足 SciPy 的 solve_banded 函数的要求。

import numpy as np    

def lu_decomp3(a):
    """
    c,d,e = lu_decomp3(a).
    LU decomposition of tridiagonal matrix a = [c\d\e]. On output
    {c},{d} and {e} are the diagonals of the decomposed matrix a.
    """
    n = np.diagonal(a).size
    assert(np.all(a.shape ==(n,n))) # check if square matrix

    d = np.copy(np.diagonal(a)) # without copy (assignment destination is read-only) error is raised 
    e = np.copy(np.diagonal(a, 1))
    c = np.copy(np.diagonal(a, -1)) 

    for k in range(1,n):
        lam = c[k-1]/d[k-1]
        d[k] = d[k] - lam*e[k-1]
        c[k-1] = lam
    return c,d,e

def lu_solve3(c,d,e,b):
    """
    x = lu_solve(c,d,e,b).
    Solves [c\d\e]{x} = {b}, where {c}, {d} and {e} are the
    vectors returned from lu_decomp3.
    """
    n = len(d)
    y = np.zeros_like(b)

    y[0] = b[0]
    for k in range(1,n): 
        y[k] = b[k] - c[k-1]*y[k-1]

    x = np.zeros_like(b)
    x[n-1] = y[n-1]/d[n-1] # there is no x[n] out of range
    for k in range(n-2,-1,-1):
        x[k] = (y[k] - e[k]*x[k+1])/d[k]
    return x

from scipy.sparse import diags
def create_tridiagonal(size = 4):
    diag = np.random.randn(size)*100
    diag_pos1 = np.random.randn(size-1)*10
    diag_neg1 = np.random.randn(size-1)*10

    a = diags([diag_neg1, diag, diag_pos1], offsets=[-1, 0, 1],shape=(size,size)).todense()
    return a

a = create_tridiagonal(4)
b = np.random.randn(4)*10

print('matrix a is\n = {} \n\n and vector b is \n {}'.format(a, b))

c, d, e = lu_decomp3(a)
x = lu_solve3(c, d, e, b)

print("x from our function is {}".format(x))

print("check is answer correct ({})".format(np.allclose(np.dot(a, x), b)))


## Test Scipy
from scipy.linalg import solve_banded

def diagonal_form(a, upper = 1, lower= 1):
    """
    a is a numpy square matrix
    this function converts a square matrix to diagonal ordered form
    returned matrix in ab shape which can be used directly for scipy.linalg.solve_banded
    """
    n = a.shape[1]
    assert(np.all(a.shape ==(n,n)))

    ab = np.zeros((2*n-1, n))

    for i in range(n):
        ab[i,(n-1)-i:] = np.diagonal(a,(n-1)-i)

    for i in range(n-1): 
        ab[(2*n-2)-i,:i+1] = np.diagonal(a,i-(n-1))


    mid_row_inx = int(ab.shape[0]/2)
    upper_rows = [mid_row_inx - i for i in range(1, upper+1)]
    upper_rows.reverse()
    upper_rows.append(mid_row_inx)
    lower_rows = [mid_row_inx + i for i in range(1, lower+1)]
    keep_rows = upper_rows+lower_rows
    ab = ab[keep_rows,:]


    return ab

ab = diagonal_form(a, upper=1, lower=1) # for tridiagonal matrix upper and lower = 1

x_sp = solve_banded((1,1), ab, b)
print("is our answer the same as scipy answer ({})".format(np.allclose(x, x_sp)))

回答于 2025-04-18 由 Python大师

分享举报

你可以使用 scipy.linalg.solveh_banded 这个工具。

补充说明: 你不能使用上面提到的工具，因为你的矩阵并不是对称的，我之前以为是的。不过，正如评论中提到的，托马斯算法对于这个问题非常合适。

a =       [7] * ( m - 2 ) + [3]
b = [1] + [8] * ( m - 2 ) + [4]
c = [2] + [9] * ( m - 2 )
d = [1] + [2] * ( m - 2 ) + [3]

# This is taken directly from the Wikipedia page also cited above
# this overwrites b and d
def TDMASolve(a, b, c, d):
    n = len(d) # n is the numbers of rows, a and c has length n-1
    for i in xrange(n-1):
        d[i+1] -= 1. * d[i] * a[i] / b[i]
        b[i+1] -= 1. * c[i] * a[i] / b[i]
    for i in reversed(xrange(n-1)):
        d[i] -= d[i+1] * c[i] / b[i+1]
    return [d[i] / b[i] for i in xrange(n)]

这段代码没有经过优化，也没有使用 np，但如果我（或者这里的其他朋友）有时间的话，我会把它改得更好。目前它在 m=10000 的情况下运行大约需要 10 毫秒。

回答于 2025-04-18 由 Python大师

分享举报

有一种叫做 scipy.sparse.dia_matrix 的矩阵类型，它属于 scipy.sparse，能够很好地表示你的矩阵结构。这个矩阵会存储三个数组，分别对应“位置”0（对角线）、1（上方）和-1（下方）。使用这种类型的矩阵，你可以尝试用 scipy.sparse.linalg.lsqr 来求解。如果你的问题有精确解，它会找到这个解；如果没有，它会找到一个最小二乘解。

from scipy import sparse
A_sparse = sparse.dia_matrix(A)
ret_values = sparse.linalg.lsqr(A_sparse, C)
x = ret_values[0]

不过，这种方法在利用三对角结构方面可能不是最优的，理论上可能还有更快的解决方案。这个转换的好处是，它能把矩阵乘法的开销降到最低：只使用这三个带状部分。结合迭代求解器 lsqr，应该能提高速度。

注意：我并不建议使用 scipy.sparse.linalg.spsolve，因为它会把你的矩阵转换成 csr 格式。不过，把 lsqr 替换成 spsolve 也是值得尝试的，特别是因为 spsolve 可以绑定 UMFPACK，你可以查看相关的 spsolve 文档。另外，看看这个关于 UMFPACK 的 StackOverflow 问题和回答也可能会很有帮助。

回答于 2025-04-18 由 Python大师

分享举报

这里有两个立刻可以提升性能的方法：(1) 不要使用循环，(2) 使用 scipy.linalg.solve_banded()。

我会把代码写得更像这样：

import scipy.linalg as la

# Create arrays and set values
ab = np.zeros((3,m))
b = 2*ones(m)
ab[0] = 9
ab[1] = 8
ab[2] = 7

# Fix end points
ab[0,1] = 2
ab[1,0] = 1
ab[1,-1] = 4
ab[2,-2] = 3
b[0] = 1
b[-1] = 3

return la.solve_banded ((1,1),ab,b)

虽然可能还有更优雅的方式来构建这个矩阵，但这样做也能达到效果。

在 ipython 中使用 %timeit 测试时，原始代码在 m=1000 的情况下花了 112 毫秒。而这段代码在 m=10,000 时只用了 2.94 毫秒，虽然问题规模大了十倍，但速度却快了将近一百倍！我没有耐心等原始代码在 m=10,000 时的结果。原始代码大部分时间可能花在构建数组上，但我没有测试这一点。无论如何，对于大数组来说，只存储矩阵中的非零值要高效得多。

回答于 2025-04-18 由 Python大师

分享举报

优化带状系数矩阵的 A*x = B 解法

4 个回答

撰写回答