为什么在使用Python 2.7向sqlite3数据库插入Unicode字符时会出现额外的转义字符？

Question

我查询了一个API，得到了一个包含以下值的json数据：

{
    ...
    "Attribute" : "Some W\u00e9irdness", 
    ...
}

（当然，正确的值是'Some Wéirdness'）

我把这个值和其他一些内容一起放进了一个我想添加到sqlite3数据库的字段列表里。这个列表看起来是这样的：

[None, 203, None, None, True, u'W\xe9irdness', None, u'Some', None, None, u'Some W\xe9irdness', None, u'Some W\xe9irdness', None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]

我注意到我们已经从\x00e9变成了\xe9，我还不太明白这是为什么，但我希望这没关系……这只是不同的unicode编码。

在尝试插入到sqlite表之前，我把这个列表“字符串化”（见下面的函数），然后把它变成一个元组：

('', '203', '', '', 'True', 'W\xe9irdness', '', 'Some', '', '', 'Some W\xe9irdness', '', 'Some W\xe9irdness', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '')

然后我进行插入：

my_tuple = tuple(val for val in my_utils.stringatize(my_list))

sql = "INSERT OR REPLACE INTO roster VALUES %s" % repr(my_tuple)

cur.execute(sql)

当我稍后用SELECT语句取回数据时，发现这个值多了一个转义字符（反斜杠）：

u'Some W\\xe9irdness'

首先，我已经知道不应该在sqlite中使用字符串插值。不过，我搞不清楚当每条记录的字段数量可能会变化时，如何用问号来处理，我希望代码能灵活一些，不用每次添加字段时都回来加问号。（如果你知道更好的方法，我很乐意听，但这可能是另一个话题。）

为了排查问题，我打印了格式化的插入sql语句，结果只看到一个反斜杠：

INSERT OR REPLACE INTO roster VALUES ('', '203', '', '', 'True', 'W\xe9irdness', '', 'Some', '', '', 'Some W\xe9irdness', '', 'Some W\xe9irdness', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '')

这和我上面列表里的样子是一样的，所以我很困惑。也许这被解释成了一个需要转义的字符串，而xe9只是被当作ascii文本处理。以下是我用来准备插入列表的字符串化函数：

def stringatize(cell_list, encoding = 'raw_unicode_escape', delete_quotes = False):
    """
    Converts every 'cell' in a 'row' (generally something extracted from
     a spreadsheet) to a unicode, then returns the list of cells (with all
     strings now, of course).
    """

    stringatized_list = []

    for cell in cell_list:
        if isinstance(cell, (datetime.datetime)):
            new = cell.strftime("%Y-%m-%dT%H:%M:%S")
        elif isinstance(cell, (datetime.date)):
            new = cell.strftime("%Y-%m-%d")
        elif isinstance(cell, (datetime.time)):
            new = cell.strftime("%H:%M:%S")
        elif isinstance(cell, (int, long)):
            new = str(cell)    
        elif isinstance(cell, (float)):    
            new = "%.2f" % cell
        elif cell == None:
            new = ""    
        else:                
            new = cell    

        if delete_quotes:    
            new = new.replace("\"","")   

        my_unicode = new.encode(encoding)    
        stringatized_list.append(my_unicode)

    return stringatized_list

我很感激你们能给我提供一些想法。我的目标是最终把这个值导入到Excel表中，Excel支持Unicode，因此应该能正确显示这个值。

编辑：针对@CL的询问，我尝试从我的字符串化函数中去掉'encode'这一行。

现在它的结尾变成了这样：

    #my_unicode = new.encode(encoding)
    my_unicode = new

    stringatized_list.append(my_unicode)

return stringatized_list

新的sql看起来是这样的（下面是我尝试执行时得到的错误追踪信息）：

INSERT OR REPLACE INTO roster VALUES ('', u'203', u'', u'', 'True', u'W\xe9irdness', '', u'Some', '', '', u'Some W\xe9irdness', '', u'Some W\xe9irdness', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '')

Traceback (most recent call last):
  File "test.py", line 80, in <module>
    my_call
  File redacted.py, line 102, in my_function
    cur.execute(sql)
sqlite3.OperationalError: near "'203'": syntax error

我确实是想把那个数字转换成字符串。我怀疑这和我做的repr(my_tuple)有关，而u''不再表示unicode了。

unicode json sqlite3 编码问题转义字符数据库插入 excel导入字符串化

为什么在使用Python 2.7向sqlite3数据库插入Unicode字符时会出现额外的转义字符？

1 个回答

撰写回答