在python中用（扩展）url编码对二进制字符串进行Unescape/unquote

网友

1楼 · 编辑于 2024-05-26 11:53:59

The strings sadly come in the extended URL-encoding form, e.g. "%u616f"

顺便说一句，这与URL编码无关。它是JavaScript escape（）函数生成的任意格式，几乎没有其他格式。如果可以的话，最好的做法是更改JavaScript以使用encodeURIComponent函数。这将为您提供一个正确的、标准的URL编码的UTF-8字符串。在

e.g. "%u616f". I want to store them in a file that then contains the raw binary values, eg. 0x61 0x6f here.

您确定0x61 0x6f（字母“ao”）是要存储的字节流吗？这意味着UTF-16BE编码；您是这样对待所有字符串的吗？在

通常，您希望将输入转换为Unicode，然后使用适当的编码（如UTF-8或UTF-16LE）将其写出。下面是一个快速的方法，依靠让Python读取'%u1234'作为字符串转义格式u'\u1234'的方法：

>>> ex= 'hello %e9 %u616f'
>>> ex.replace('%u', r'\u').replace('%', r'\x').decode('unicode-escape')
u'hello \xe9 \u616f'

>>> print _
hello é 慯

>>> _.encode('utf-8')
'hello \xc2\xa0 \xe6\x85\xaf'

网友

2楼 · 编辑于 2024-05-26 11:53:59

我猜你自己写解码器。下面是一个让您开始的实现：

def decode(file):
    while True:
        c = file.read(1)
        if c == "":
            # End of file
            break
        if c != "%":
            # Not an escape sequence
            yield c
            continue
        c = file.read(1)
        if c != "u":
            # One hex-byte
            yield chr(int(c + file.read(1), 16))
            continue
        # Two hex-bytes
        yield chr(int(file.read(2), 16))
        yield chr(int(file.read(2), 16))

用法：

^{pr2}$

网友

3楼 · 编辑于 2024-05-26 11:53:59

以下是基于regex的方法：

# the replace function concatenates the two matches after 
# converting them from hex to ascii
repfunc = lambda m: chr(int(m.group(1), 16))+chr(int(m.group(2), 16))

# the last parameter is the text you want to convert
result = re.sub('%u(..)(..)', repfunc, '%u616f')
print result

给予

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

在python中用（扩展）url编码对二进制字符串进行Unescape/unquote

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >