有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java编码,即使对于字母也不同于ASCII

是否有任何字符编码在消费类设备(与大型机相反)上相当常见,并且将不同于ASCII的字母A-Za-z0-9映射到ASCII

目前我正在考虑一个Java应用程序,所以我想知道,在某些国家,一些Java软件的临时用户是否有可能最终以^{}的方式报告^{}返回与^{}不同的内容。我正在尝试解决是否必须解决某些兼容性问题,这些问题可能是由于这方面的不同行为导致的

我知道,从历史上看,EBCDIC将是ASCII不兼容编码的主要例子。但是,它是在最近的消费类设备上使用,还是仅在IBM大型机和老式计算机上使用?EBCDIC的遗产是否存在于某些国家的通用编码中

我还知道UTF-16是ASCII不兼容的,在Windows上以这种方式编码文件是很常见的。但据我所知,这始终只是文件内容,而不是默认的应用程序区域设置。用户是否可以将其Windows计算机配置为使用UTF-16作为系统代码页,而不中断至少一半的应用程序

据我所知,所有在亚洲使用的前Unicode多字节编码仍然将ASCII范围00-7F映射到与ASCII兼容的东西,至少在字母和数字方面是如此。是否有任何亚洲编码仍在使用,其所有码点使用超过一个字节?或者在其他大陆


共 (1) 个答案

  1. # 1 楼答案

    这里有一个简单的程序可以找到答案。失败的字符集是否足够常见取决于您

    import java.nio.charset.Charset;
    import java.nio.charset.StandardCharsets;
    import java.util.Arrays;
    
    public class EncodingTest {
        public static void main(String[] args) {
            String s = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789";
            byte[] b = s.getBytes(StandardCharsets.UTF_8);
            for (Charset cs : Charset.availableCharsets().values()) {
                try {
                    byte[] b2 = s.getBytes(cs);
                    if (!Arrays.equals(b, b2)) {
                        System.out.println(cs.displayName() + " doesn't give the same result");
                    }
                }
                catch (Exception e) {
                    System.out.println(cs.displayName() + " throws an exception");
                }
            }
        }
    }
    

    我机器上的结果是

    IBM-Thai doesn't give the same result
    IBM01140 doesn't give the same result
    IBM01141 doesn't give the same result
    IBM01142 doesn't give the same result
    IBM01143 doesn't give the same result
    IBM01144 doesn't give the same result
    IBM01145 doesn't give the same result
    IBM01146 doesn't give the same result
    IBM01147 doesn't give the same result
    IBM01148 doesn't give the same result
    IBM01149 doesn't give the same result
    IBM037 doesn't give the same result
    IBM1026 doesn't give the same result
    IBM1047 doesn't give the same result
    IBM273 doesn't give the same result
    IBM277 doesn't give the same result
    IBM278 doesn't give the same result
    IBM280 doesn't give the same result
    IBM284 doesn't give the same result
    IBM285 doesn't give the same result
    IBM290 doesn't give the same result
    IBM297 doesn't give the same result
    IBM420 doesn't give the same result
    IBM424 doesn't give the same result
    IBM500 doesn't give the same result
    IBM870 doesn't give the same result
    IBM871 doesn't give the same result
    IBM918 doesn't give the same result
    ISO-2022-CN throws an exception
    JIS_X0212-1990 doesn't give the same result
    UTF-16 doesn't give the same result
    UTF-16BE doesn't give the same result
    UTF-16LE doesn't give the same result
    UTF-32 doesn't give the same result
    UTF-32BE doesn't give the same result
    UTF-32LE doesn't give the same result
    x-IBM1025 doesn't give the same result
    x-IBM1097 doesn't give the same result
    x-IBM1112 doesn't give the same result
    x-IBM1122 doesn't give the same result
    x-IBM1123 doesn't give the same result
    x-IBM1364 doesn't give the same result
    x-IBM300 doesn't give the same result
    x-IBM833 doesn't give the same result
    x-IBM834 doesn't give the same result
    x-IBM875 doesn't give the same result
    x-IBM930 doesn't give the same result
    x-IBM933 doesn't give the same result
    x-IBM935 doesn't give the same result
    x-IBM937 doesn't give the same result
    x-IBM939 doesn't give the same result
    x-JIS0208 doesn't give the same result
    x-JISAutoDetect throws an exception
    x-MacDingbat doesn't give the same result
    x-MacSymbol doesn't give the same result
    x-UTF-16LE-BOM doesn't give the same result
    X-UTF-32BE-BOM doesn't give the same result
    X-UTF-32LE-BOM doesn't give the same result