Python 编码函数无法解码
我写了这段Python代码,想把一些对象转换成由0和1组成的字符串,但解码的时候失败了,因为数据无法被反序列化。这就是代码:
def encode(obj):
'convert an object to ones and zeros'
def tobin(str):
rstr = ''
for f in str:
if f == "0": rstr += "0000"
elif f == "1": rstr += "0001"
elif f == "2": rstr += "0010"
elif f == "3": rstr += "0100"
elif f == "4": rstr += "1000"
elif f == "5": rstr += "1001"
elif f == "6": rstr += "1010"
elif f == "7": rstr += "1100"
elif f == "8": rstr += "1101"
elif f == "9": rstr += "1110"
else: rstr += f
return rstr
import pickle, StringIO
f = StringIO.StringIO()
pickle.dump(obj, f)
data = f.getvalue()
import base64
return tobin(base64.b16encode(base64.b16encode(data)))
def decode(data):
def unbin(data):
rstr = ''
for f in data:
if f == "0000": rstr += "0"
elif f == "0001": rstr += "1"
elif f == "0010": rstr += "2"
elif f == "0100": rstr += "3"
elif f == "1000": rstr += "4"
elif f == "1001": rstr += "5"
elif f == "1010": rstr += "6"
elif f == "1100": rstr += "7"
elif f == "1101": rstr += "8"
elif f == "1110": rstr += "9"
return rstr
import base64
ndata = base64.b16decode(base64.b16decode(unbin(data)))
import pickle, StringIO
f = StringIO.StringIO(ndata)
obj = pickle.load(f)
return obj
4 个回答
0
顺便说一下,base64.b16encode(base64.b16encode(data))
和 data.encode('hex').encode('hex')
是一样的。其实还有更简单、更快的方法来进行这种转换。
def tobin(numStr):
return ''.join(("0000","0001","0010","0100","1000","1001","1010","1100","1101","1110")[int(c)] for c in numStr)
这种编码的整体思路,虽然看起来复杂,但其实并不是很好。首先,它并没有提供太多的加密效果,因为十六进制转储中的每一个数字总是对应着同样的8位长的0和1字符串:
>>> hexd = '0123456789ABCDEF'
>>> s = hexd.encode('hex')
>>> s
'30313233343536373839414243444546'
>>> s=''.join(["0000","0001","0010","0100","1000","1001","1010","1100","1101","1110"][int(c)] for c in s)
>>> s
'01000000010000010100001001000100010010000100100101001010010011000100110101001110100000011000001010000100100010001000100110001010'
>>> for i in range(0,len(s),8):
... print hexd[i/8], s[i:i+8], chr(int(s[i:i+8],2))
...
0 01000000 @
1 01000001 A
2 01000010 B
3 01000100 D
4 01001000 H
5 01001001 I
6 01001010 J
7 01001100 L
8 01001101 M
9 01001110 N
A 10000001
B 10000010 ‚
C 10000100 „
D 10001000 ˆ
E 10001001 ‰
F 10001010 Š
其次,这种方法会让序列化后的对象大小增加16倍!即使你通过将每8位的'0'和'1'转换成字节来压缩它(比如用chr(int(encoded[i:i+8],2))
),那样也还是会比原来的序列化数据大2倍。
1
我觉得我有一个更好的解决办法。这个方法应该更安全,因为它“加密”了所有内容,而不仅仅是数字:
MAGIC = 0x15 # CHOOSE ANY TWO HEX DIGITS YOU LIKE
# THANKS TO NAS BANOV FOR THE FOLLOWING:
unbin = tobin = lambda s: ''.join(chr(ord(c) ^ MAGIC) for c in s)
2
我觉得这里有几个问题,但其中一个是,当你进行解码时,你需要在你的 unbin() 函数中一次处理4个字符,而不是像现在这样一个一个字符地处理。