java如何读取多行编码的zip文件
在我的java web应用程序中,当我上传一个Zip文件(线程转储)时,我在servlet中得到inputstream
。我使用Zip4j库解压文件,然后将其写入文件。此zip文件包含多种编码内容(UTF-8、windows-1252、ISO-8859-1、ISO-8859-2、IBM424_rtl)。当我打开输出文件时,我看到一些像这样的字符 Mac OS X 2 € ² ATTR ² ˜
下面是一个示例代码。你能告诉我如何解决这个问题吗
// Using Zip4j library to uncompress ZIP format
ZipInputStream zis = new ZipInputStream(iStream);
FileOutputStream zos = new FileOutputStream("output_file.txt");
ByteArrayOutputStream out = new ByteArrayOutputStream();
LocalFileHeader localFileHeader = zis.getNextEntry();
while (localFileHeader != null) {
if(localFileHeader.isDirectory()) {
localFileHeader = zis.getNextEntry();
continue;
}
IOUtils.copy(zis, out);
localFileHeader = zis.getNextEntry();
}
InputStreamReader isr = new InputStreamReader(new ByteArrayInputStream(out.toByteArray()));
BufferedReader reader = new BufferedReader(isr);
String str;
while ((str = reader.readLine()) != null) {
// This is a custom method that will return the charset of the input string using apache tikka library
String encoding = CharsetDetector.detectCharset(str);
zos.write(str.getBytes(encoding));
zos.write("\n".getBytes());
}
isr.close();
reader.close();
zos.close();
zis.close();
// Method is used to detect charset
public static String detectCharset(String text) throws IOException {
org.apache.tika.parser.txt.CharsetDetector detector = new org.apache.tika.parser.txt.CharsetDetector();
detector.setText(text.getBytes());
String charset = detector.detect().getName();
return charset;
}
Note: I am running application on windows machine.
提前谢谢
共 (0) 个答案