Python 编码函数无法解码

1 投票
4 回答
591 浏览
提问于 2025-04-16 00:42

我写了这段Python代码,想把一些对象转换成由0和1组成的字符串,但解码的时候失败了,因为数据无法被反序列化。这就是代码:

def encode(obj):
    'convert an object to ones and zeros'
    def tobin(str):
        rstr = ''
        for f in str:
            if f == "0": rstr += "0000"
            elif f == "1": rstr += "0001"
            elif f == "2": rstr += "0010"
            elif f == "3": rstr += "0100"
            elif f == "4": rstr += "1000"
            elif f == "5": rstr += "1001"
            elif f == "6": rstr += "1010"
            elif f == "7": rstr += "1100"
            elif f == "8": rstr += "1101"
            elif f == "9": rstr += "1110"
            else: rstr += f
        return rstr
    import pickle, StringIO
    f = StringIO.StringIO()
    pickle.dump(obj, f)
    data = f.getvalue()
    import base64
    return tobin(base64.b16encode(base64.b16encode(data)))
def decode(data):
    def unbin(data):
        rstr = ''
        for f in data:
            if f == "0000": rstr += "0"
            elif f == "0001": rstr += "1"
            elif f == "0010": rstr += "2"
            elif f == "0100": rstr += "3"
            elif f == "1000": rstr += "4"
            elif f == "1001": rstr += "5"
            elif f == "1010": rstr += "6"
            elif f == "1100": rstr += "7"
            elif f == "1101": rstr += "8"
            elif f == "1110": rstr += "9"
        return rstr
    import base64
    ndata = base64.b16decode(base64.b16decode(unbin(data)))
    import pickle, StringIO
    f = StringIO.StringIO(ndata)
    obj = pickle.load(f)
    return obj

4 个回答

0

顺便说一下,base64.b16encode(base64.b16encode(data))data.encode('hex').encode('hex') 是一样的。其实还有更简单、更快的方法来进行这种转换。

def tobin(numStr):
    return ''.join(("0000","0001","0010","0100","1000","1001","1010","1100","1101","1110")[int(c)] for c in numStr)

这种编码的整体思路,虽然看起来复杂,但其实并不是很好。首先,它并没有提供太多的加密效果,因为十六进制转储中的每一个数字总是对应着同样的8位长的0和1字符串:

>>> hexd = '0123456789ABCDEF'
>>> s = hexd.encode('hex')
>>> s
'30313233343536373839414243444546'
>>> s=''.join(["0000","0001","0010","0100","1000","1001","1010","1100","1101","1110"][int(c)] for c in s)
>>> s
'01000000010000010100001001000100010010000100100101001010010011000100110101001110100000011000001010000100100010001000100110001010'
>>> for i in range(0,len(s),8):
...     print hexd[i/8], s[i:i+8], chr(int(s[i:i+8],2))
... 
0 01000000 @
1 01000001 A
2 01000010 B
3 01000100 D
4 01001000 H
5 01001001 I
6 01001010 J
7 01001100 L
8 01001101 M
9 01001110 N
A 10000001 
B 10000010 ‚
C 10000100 „
D 10001000 ˆ
E 10001001 ‰
F 10001010 Š

其次,这种方法会让序列化后的对象大小增加16倍!即使你通过将每8位的'0'和'1'转换成字节来压缩它(比如用chr(int(encoded[i:i+8],2))),那样也还是会比原来的序列化数据大2倍

1

我觉得我有一个更好的解决办法。这个方法应该更安全,因为它“加密”了所有内容,而不仅仅是数字:

MAGIC = 0x15 # CHOOSE ANY TWO HEX DIGITS YOU LIKE

# THANKS TO NAS BANOV FOR THE FOLLOWING:
unbin = tobin = lambda s: ''.join(chr(ord(c) ^ MAGIC) for c in s)
2

我觉得这里有几个问题,但其中一个是,当你进行解码时,你需要在你的 unbin() 函数中一次处理4个字符,而不是像现在这样一个一个字符地处理。

撰写回答