Java和Python中的Unicode字符串支持

0 投票

1 回答

1087 浏览

提问于 2025-04-17 18:28

我有一个安卓应用，可以读取短信并把它们发送到谷歌的应用引擎服务器。有些用户反映，某些语言的短信显示不正常。

        // Execute query
        cursor = context.getContentResolver().query(
                SMS_PROVIDER_URI,
                SMS_QUERY_FIELDS,
                "date >= " + startDate.getTime(),  // selection - get messages > startDate
                null,                              // selectionArgs
                "date ASC");                       // order - get oldest messages first

        // Iterate results
        if (cursor != null && cursor.moveToFirst()) {

            // read through all the sms and create a list
            do {
                String sender              = cursor.getString(0);
                String message             = cursor.getString(2);
                boolean isIncomingMessage  = cursor.getString(3).contains("1");
                Date date                  = new Date(cursor.getLong(1));

                String contactName = ContactLookup.lookup(context, sender);

                smsList.add(new SMSMessageInfo(sender, contactName,
                        message, isIncomingMessage, date));

            } while (cursor.moveToNext());
        }

message变量里包含了来自不同语言的短信。我该怎么支持这些语言呢？另外，我需要把这些短信发送到我的服务器（用Python写的），那我该怎么在服务器上处理这些Unicode字符呢？

1 个回答

在Python 2.7中，有两种字符串类型，分别是str（标准字符串，由字节组成）和unicode（由unicode字符组成，通常用u前缀表示，比如u"foo"）。我们可以通过实例上的方法来进行转换。

u"blä".encode('utf8') → "bl\xc3\xa4"  # from unicode to str
"bl\xc3\xa4".decode('utf8') → u"blä"  # from str to unicode

转换通常是隐式发生的，比如说如果你把一个str和一个unicode相加，str会自动转换成unicode（默认使用ascii编码）然后再进行拼接。

另一方面，当你打印一个unicode实例时，它会先被转换成str，这个转换使用的编码取决于你打印的环境（通常也是ascii）。

这些自动转换的情况常常会导致错误（特别是当转换失败时）。如果你捕获了太多的错误，可能会让一些问题被忽视，结果就是某些功能无法正常工作。

回答于 2025-04-17 由 Python大师

分享举报

Java和Python中的Unicode字符串支持

1 个回答

撰写回答