java如何读取多行编码的zip文件

4 月，2 周 Questions & Answers 104

在我的java web应用程序中，当我上传一个Zip文件（线程转储）时，我在servlet中得到inputstream。我使用Zip4j库解压文件，然后将其写入文件。此zip文件包含多种编码内容（UTF-8、windows-1252、ISO-8859-1、ISO-8859-2、IBM424_rtl）。当我打开输出文件时，我看到一些像这样的字符 Mac OS X 2 € ² ATTR ² ˜

下面是一个示例代码。你能告诉我如何解决这个问题吗

// Using Zip4j library to uncompress ZIP format 
ZipInputStream zis = new ZipInputStream(iStream);

FileOutputStream zos = new FileOutputStream("output_file.txt");
ByteArrayOutputStream out = new ByteArrayOutputStream();
        
LocalFileHeader localFileHeader = zis.getNextEntry();
while (localFileHeader != null) {
            
    if(localFileHeader.isDirectory()) {
            
        localFileHeader = zis.getNextEntry();
        continue;
    }
            
    IOUtils.copy(zis, out);
    localFileHeader = zis.getNextEntry();
}
        
InputStreamReader isr = new InputStreamReader(new ByteArrayInputStream(out.toByteArray()));
BufferedReader reader = new BufferedReader(isr);
        
String str;
while ((str = reader.readLine()) != null) {
    
    // This is a custom method that will return the charset of the input string using apache tikka library      
    String encoding = CharsetDetector.detectCharset(str);
            
    zos.write(str.getBytes(encoding));
    zos.write("\n".getBytes());
}
          
isr.close();
reader.close();
zos.close();
zis.close();

// Method is used to detect charset
public static String detectCharset(String text) throws IOException {
    
    org.apache.tika.parser.txt.CharsetDetector detector = new org.apache.tika.parser.txt.CharsetDetector();
    detector.setText(text.getBytes());
    String charset = detector.detect().getName();
    
    return charset;
}

Note: I am running application on windows machine.

提前谢谢

Python中文网

有 Java 编程相关的问题?

java如何读取多行编码的zip文件

共 (0) 个答案