有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

utf 8在Java中将UTF8转换为ISO88591如何将其保持为单字节

我正在尝试将UTF-8中java编码的字符串转换为ISO-8859-1。例如,在字符串“–abcd”中,ISO-8859-1将–表示为E2。在UTF-8中,它表示为两个字节。我相信。当我执行getbytes(编码)并使用ISO-8859-1编码中的字节创建一个新字符串时,我会得到两个不同的字符。â. 是否有其他方法来保持角色不变,即–abcd


共 (6) 个答案

  1. # 1 楼答案

    如果您处理的是UTF-16以外的字符编码,那么不应该使用java.lang.Stringchar原语——您应该只使用byte[]数组或ByteBuffer对象。然后,可以使用^{}在编码之间进行转换:

    Charset utf8charset = Charset.forName("UTF-8");
    Charset iso88591charset = Charset.forName("ISO-8859-1");
    
    ByteBuffer inputBuffer = ByteBuffer.wrap(new byte[]{(byte)0xC3, (byte)0xA2});
    
    // decode UTF-8
    CharBuffer data = utf8charset.decode(inputBuffer);
    
    // encode ISO-8559-1
    ByteBuffer outputBuffer = iso88591charset.encode(data);
    byte[] outputData = outputBuffer.array();
    
  2. # 2 楼答案

    对于文件编码

    public class FRomUtf8ToIso {
            static File input = new File("C:/Users/admin/Desktop/pippo.txt");
            static File output = new File("C:/Users/admin/Desktop/ciccio.txt");
    
    
        public static void main(String[] args) throws IOException {
    
            BufferedReader br = null;
    
            FileWriter fileWriter = new FileWriter(output);
            try {
    
                String sCurrentLine;
    
                br = new BufferedReader(new FileReader( input ));
    
                int i= 0;
                while ((sCurrentLine = br.readLine()) != null) {
                    byte[] isoB =  encode( sCurrentLine.getBytes() );
                    fileWriter.write(new String(isoB, Charset.forName("ISO-8859-15") ) );
                    fileWriter.write("\n");
                    System.out.println( i++ );
                }
    
            } catch (IOException e) {
                e.printStackTrace();
            } finally {
                try {
                    fileWriter.flush();
                    fileWriter.close();
                    if (br != null)br.close();
                } catch (IOException ex) {
                    ex.printStackTrace();
                }
            }
    
        }
    
    
        static byte[] encode(byte[] arr){
            Charset utf8charset = Charset.forName("UTF-8");
            Charset iso88591charset = Charset.forName("ISO-8859-15");
    
            ByteBuffer inputBuffer = ByteBuffer.wrap( arr );
    
            // decode UTF-8
            CharBuffer data = utf8charset.decode(inputBuffer);
    
            // encode ISO-8559-1
            ByteBuffer outputBuffer = iso88591charset.encode(data);
            byte[] outputData = outputBuffer.array();
    
            return outputData;
        }
    
    }
    
  3. # 3 楼答案

    这就是我需要的:

    public static byte[] encode(byte[] arr, String fromCharsetName) {
        return encode(arr, Charset.forName(fromCharsetName), Charset.forName("UTF-8"));
    }
    
    public static byte[] encode(byte[] arr, String fromCharsetName, String targetCharsetName) {
        return encode(arr, Charset.forName(fromCharsetName), Charset.forName(targetCharsetName));
    }
    
    public static byte[] encode(byte[] arr, Charset sourceCharset, Charset targetCharset) {
    
        ByteBuffer inputBuffer = ByteBuffer.wrap( arr );
    
        CharBuffer data = sourceCharset.decode(inputBuffer);
    
        ByteBuffer outputBuffer = targetCharset.encode(data);
        byte[] outputData = outputBuffer.array();
    
        return outputData;
    }
    
  4. # 4 楼答案

    从使用UTF-8对字符串进行编码的一组字节开始,从该数据创建一个字符串,然后获取一些以不同编码对字符串进行编码的字节:

        byte[] utf8bytes = { (byte)0xc3, (byte)0xa2, 0x61, 0x62, 0x63, 0x64 };
        Charset utf8charset = Charset.forName("UTF-8");
        Charset iso88591charset = Charset.forName("ISO-8859-1");
    
        String string = new String ( utf8bytes, utf8charset );
    
        System.out.println(string);
    
        // "When I do a getbytes(encoding) and "
        byte[] iso88591bytes = string.getBytes(iso88591charset);
    
        for ( byte b : iso88591bytes )
            System.out.printf("%02x ", b);
    
        System.out.println();
    
        // "then create a new string with the bytes in ISO-8859-1 encoding"
        String string2 = new String ( iso88591bytes, iso88591charset );
    
        // "I get a two different chars"
        System.out.println(string2);
    

    这将正确输出字符串和iso88591字节:

    âabcd 
    e2 61 62 63 64 
    âabcd
    

    因此,您的字节数组没有与正确的编码配对:

        String failString = new String ( utf8bytes, iso88591charset );
    
        System.out.println(failString);
    

    输出

    âabcd
    

    (或者,您只是将utf8字节写入一个文件,并将它们作为iso88591读取到其他位置)

  5. # 5 楼答案

    如果字符串中有正确的编码,则无需执行更多操作来获取另一种编码的字节

    public static void main(String[] args) throws Exception {
        printBytes("â");
        System.out.println(
                new String(new byte[] { (byte) 0xE2 }, "ISO-8859-1"));
        System.out.println(
                new String(new byte[] { (byte) 0xC3, (byte) 0xA2 }, "UTF-8"));
    }
    
    private static void printBytes(String str) {
        System.out.println("Bytes in " + str + " with ISO-8859-1");
        for (byte b : str.getBytes(StandardCharsets.ISO_8859_1)) {
            System.out.printf("%3X", b);
        }
        System.out.println();
        System.out.println("Bytes in " + str + " with UTF-8");
        for (byte b : str.getBytes(StandardCharsets.UTF_8)) {
            System.out.printf("%3X", b);
        }
        System.out.println();
    }
    

    输出:

    Bytes in â with ISO-8859-1
     E2
    Bytes in â with UTF-8
     C3 A2
    â
    â
    
  6. # 6 楼答案

    byte[] iso88591Data = theString.getBytes("ISO-8859-1");
    

    我会成功的。从您的描述来看,似乎您正在尝试“存储ISO-8859-1字符串”。Java中的字符串对象总是以UTF-16隐式编码。没有办法改变这种编码

    不过,您可以做的是获取构成它的其他编码的字节(使用.getBytes()方法,如上所示)