如何手动解量化一层的输出并为下一层重新量化?
我正在做一个学校项目,需要手动对模型的每一层进行量化。具体来说,我想手动实现:
量化的激活值,结合量化的权重 A - 层 A - 量化的输出 - 反量化的输出 - 重新量化的输出,结合量化的权重 B - 层 B - ...
我知道 Pytorch 已经有一个量化的功能,但那个功能只支持 int8。我想把量化从 16 位降到 2 位,然后比较它们的准确性。
我遇到的问题是,量化后某一层的输出变得非常大(16 位),我不知道怎么把它反量化回来。我在量化时使用了激活值和权重的相同最小值和最大值。这里有一个例子:
Activation = [1,2,3,4]
Weight = [5,6,7,8]
Min and max across activation and weight = 1, 8
Expected, non-quantized output = 70
Quantize with bit = 16
Quantized activation = [-32768, -23406, -14044, -4681]
Quantized weight = [4681, 14043, 23405, 32767]
Quantized output = -964159613
Dequantize output with min = 1, max = 8 = -102980
这个计算对我来说是有道理的,因为输出是通过激活值和权重相乘得到的,它们的大小增加也是一起相乘的。如果我用原来的最小值和最大值进行一次反量化,输出变得更大也是合理的。
Pytorch 是怎么处理反量化的?我尝试去找 Pytorch 的量化部分,但没有找到。怎么才能反量化输出呢?
1 个回答
0
我觉得你用来计算去量化输出的公式可能有问题。
import numpy as np
# Original values
activation = np.array([1, 2, 3, 4])
weight = np.array([5, 6, 7, 8])
# Quantization parameters
bit = 16 # Desired bit precision
min_val = min(np.min(activation), np.min(weight))
max_val = max(np.max(activation), np.max(weight))
# Calculate scale factor
scale_factor = (2 ** (bit - 1) - 1) / max(abs(min_val), abs(max_val))
# Quantize activation and weight values
quantized_activation = np.round(activation * scale_factor).astype(np.int16)
quantized_weight = np.round(weight * scale_factor).astype(np.int16)
# Dequantize activation and weight values
dequantized_activation = quantized_activation / scale_factor
dequantized_weight = quantized_weight / scale_factor
# Print values
print("Original activation:", activation)
print("Original weight:", weight)
print("Minimum value:", min_val)
print("Maximum value:", max_val)
print("Scale factor:", scale_factor)
print("Quantized activation:", quantized_activation)
print("Quantized weight:", quantized_weight)
print("Dequantized activation:", dequantized_activation)
print("Dequantized weight:", dequantized_weight)
---------------------------------------------------------
Original activation: [1 2 3 4]
Original weight: [5 6 7 8]
Minimum value: 1
Maximum value: 8
Scale factor: 4095.875
Quantized activation: [ 4096 8192 12288 16384]
Quantized weight: [20479 24575 28671 32767]
Dequantized activation: [1.00003052 2.00006104 3.00009156 4.00012207]
Dequantized weight: [4.99990844 5.99993896 6.99996948 8. ]
计算输出:
output = np.sum(dequantized_activation * dequantized_weight)
print("Dequantized output:", output) # 70.00183110125477