Python:紧凑而可逆地将大整数编码为base64或base16,长度可变或固定

2024-06-10 16:38:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我想将具有任意位数的无符号或有符号整数压缩成base64、base32或base16(十六进制)表示形式。输出最终将作为一个字符串使用,该字符串将用作文件名,但这应该是不重要的。我使用的是最新的python3。在

这是可行的,但远远不够紧凑:

>>> import base64, sys
>>> i: int = 2**62 - 3  # Can be signed or unsigned.
>>> b64: bytes =  base64.b64encode(str(i).encode()) # Not a compact encoding.
>>> len(b64), sys.getsizeof(b64)
(28, 61)

有一个prior question,现在关闭了,它的答案严格地与低效表示有关。再次注意,我们不想在这个练习中使用任何字符串或不必要的长字节序列。因此,这个问题不是那个问题的重复。在


Tags: 字符串import文件名sys符号整数can形式
1条回答
网友
1楼 · 发布于 2024-06-10 16:38:49

这个答案部分源于Erik A的不同评论,比如for this答案。整数首先被紧凑地转换成字节,然后字节被编码成变量base。在

from typing import Callable, Optional
import base64

class IntBaseEncoder:
    """Reversibly encode an unsigned or signed integer into a customizable encoding of a variable or fixed length."""
    # Ref: https://stackoverflow.com/a/54152763/
    def __init__(self, encoding: str, *, bits: Optional[int] = None, signed: bool = False):
        """
        :param encoder: Name of encoding from base64 module, e.g. b64, urlsafe_b64, b32, b16, etc.
        :param bits: Max bit length of int which is to be encoded. If specified, the encoding is of a fixed length,
        otherwise of a variable length.
        :param signed: If True, integers are considered signed, otherwise unsigned.
        """
        self._decoder: Callable[[bytes], bytes] = getattr(base64, f'{encoding}decode')
        self._encoder: Callable[[bytes], bytes] = getattr(base64, f'{encoding}encode')
        self.signed: bool = signed
        self.bytes_length: Optional[int] = bits and self._bytes_length(2 ** bits - 1)

    def _bytes_length(self, i: int) -> int:
        return (i.bit_length() + 7 + self.signed) // 8

    def encode(self, i: int) -> bytes:
        length = self.bytes_length or self._bytes_length(i)
        i_bytes = i.to_bytes(length, byteorder='big', signed=self.signed)
        return self._encoder(i_bytes)

    def decode(self, b64: bytes) -> int:
        i_bytes = self._decoder(b64)
        return int.from_bytes(i_bytes, byteorder='big', signed=self.signed)

# Tests:
import unittest

class TestIntBaseEncoder(unittest.TestCase):

    ENCODINGS = ('b85', 'b64', 'urlsafe_b64', 'b32', 'b16')

    def test_unsigned_with_variable_length(self):
        for encoding in self.ENCODINGS:
            encoder = IntBaseEncoder(encoding)
            previous_length = 0
            for i in range(1234):
                encoded = encoder.encode(i)
                self.assertGreaterEqual(len(encoded), previous_length)
                self.assertEqual(i, encoder.decode(encoded))

    def test_signed_with_variable_length(self):
        for encoding in self.ENCODINGS:
            encoder = IntBaseEncoder(encoding, signed=True)
            previous_length = 0
            for i in range(-1234, 1234):
                encoded = encoder.encode(i)
                self.assertGreaterEqual(len(encoded), previous_length)
                self.assertEqual(i, encoder.decode(encoded))

    def test_unsigned_with_fixed_length(self):
        for encoding in self.ENCODINGS:
            for maxint in range(257):
                encoder = IntBaseEncoder(encoding, bits=maxint.bit_length())
                maxlen = len(encoder.encode(maxint))
                for i in range(maxint + 1):
                    encoded = encoder.encode(i)
                    self.assertEqual(len(encoded), maxlen)
                    self.assertEqual(i, encoder.decode(encoded))

    def test_signed_with_fixed_length(self):
        for encoding in self.ENCODINGS:
            for maxint in range(257):
                encoder = IntBaseEncoder(encoding, bits=maxint.bit_length(), signed=True)
                maxlen = len(encoder.encode(maxint))
                for i in range(-maxint, maxint + 1):
                    encoded = encoder.encode(i)
                    self.assertEqual(len(encoded), maxlen)
                    self.assertEqual(i, encoder.decode(encoded))

if __name__ == '__main__':
    unittest.main()

如果将输出用作文件名,则使用编码^{}或甚至'b16'初始化编码器是更安全的选择。在

使用示例:

^{pr2}$

相关问题 更多 >