如何手动解量化一层的输出并为下一层重新量化?

0 投票
1 回答
43 浏览
提问于 2025-04-12 09:55

我正在做一个学校项目,需要手动对模型的每一层进行量化。具体来说,我想手动实现:

量化的激活值,结合量化的权重 A - 层 A - 量化的输出 - 反量化的输出 - 重新量化的输出,结合量化的权重 B - 层 B - ...

我知道 Pytorch 已经有一个量化的功能,但那个功能只支持 int8。我想把量化从 16 位降到 2 位,然后比较它们的准确性。

我遇到的问题是,量化后某一层的输出变得非常大(16 位),我不知道怎么把它反量化回来。我在量化时使用了激活值和权重的相同最小值和最大值。这里有一个例子:

Activation = [1,2,3,4]
Weight = [5,6,7,8]
Min and max across activation and weight = 1, 8
Expected, non-quantized output = 70

Quantize with bit = 16
Quantized activation = [-32768, -23406, -14044, -4681]
Quantized weight = [4681, 14043, 23405, 32767]
Quantized output = -964159613
Dequantize output with min = 1, max = 8 = -102980

这个计算对我来说是有道理的,因为输出是通过激活值和权重相乘得到的,它们的大小增加也是一起相乘的。如果我用原来的最小值和最大值进行一次反量化,输出变得更大也是合理的。

Pytorch 是怎么处理反量化的?我尝试去找 Pytorch 的量化部分,但没有找到。怎么才能反量化输出呢?

1 个回答

0

我觉得你用来计算去量化输出的公式可能有问题。

import numpy as np

# Original values
activation = np.array([1, 2, 3, 4])
weight = np.array([5, 6, 7, 8])

# Quantization parameters
bit = 16  # Desired bit precision
min_val = min(np.min(activation), np.min(weight))
max_val = max(np.max(activation), np.max(weight))

# Calculate scale factor
scale_factor = (2 ** (bit - 1) - 1) / max(abs(min_val), abs(max_val))

# Quantize activation and weight values
quantized_activation = np.round(activation * scale_factor).astype(np.int16)
quantized_weight = np.round(weight * scale_factor).astype(np.int16)

# Dequantize activation and weight values
dequantized_activation = quantized_activation / scale_factor
dequantized_weight = quantized_weight / scale_factor

# Print values
print("Original activation:", activation)
print("Original weight:", weight)
print("Minimum value:", min_val)
print("Maximum value:", max_val)
print("Scale factor:", scale_factor)
print("Quantized activation:", quantized_activation)
print("Quantized weight:", quantized_weight)
print("Dequantized activation:", dequantized_activation)
print("Dequantized weight:", dequantized_weight)

---------------------------------------------------------

Original activation: [1 2 3 4]
Original weight: [5 6 7 8]
Minimum value: 1
Maximum value: 8
Scale factor: 4095.875
Quantized activation: [ 4096  8192 12288 16384]
Quantized weight: [20479 24575 28671 32767]
Dequantized activation: [1.00003052 2.00006104 3.00009156 4.00012207]
Dequantized weight: [4.99990844 5.99993896 6.99996948 8.        ]

计算输出:

output = np.sum(dequantized_activation * dequantized_weight)
print("Dequantized output:", output) # 70.00183110125477

撰写回答