有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

在Java中读取纯文本文件

在Java中,似乎有不同的方式来读取和写入文件数据

我想从文件中读取ASCII数据。可能的方式和它们的区别是什么


共 (6) 个答案

  1. # 1 楼答案

    下面是不使用外部库的另一种方法:

    import java.io.File;
    import java.io.FileReader;
    import java.io.IOException;
    
    public String readFile(String filename)
    {
        String content = null;
        File file = new File(filename); // For example, foo.txt
        FileReader reader = null;
        try {
            reader = new FileReader(file);
            char[] chars = new char[(int) file.length()];
            reader.read(chars);
            content = new String(chars);
            reader.close();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if(reader != null){
                reader.close();
            }
        }
        return content;
    }
    
  2. # 2 楼答案

    我必须对不同的方法进行基准测试。我将对我的发现发表评论,但简而言之,最快的方法是在FileInputStream上使用普通的旧BufferedInputStream。如果必须读取多个文件,那么三个线程将把总执行时间减少到大约一半,但是添加更多线程将逐渐降低性能,直到使用二十个线程比仅使用一个线程完成所需时间长三倍

    假设您必须读取文件并对其内容执行有意义的操作。在这里的示例中,从日志中读取行,并对包含超过某个阈值的值的行进行计数。因此,我假设单行Java 8Files.lines(Paths.get("/path/to/file.txt")).map(line -> line.split(";"))不是一个选项

    我在Java1.8、Windows7以及SSD和HDD驱动器上进行了测试

    我编写了六种不同的实现:

    rawParse:在FileInputStream上使用BufferedInputStream,然后逐字节剪切读取的行。这比任何其他单线程方法都要好,但对于非ASCII文件来说可能非常不方便

    lineReaderParse:在文件读取器上使用BufferedReader,逐行读取,通过调用字符串拆分行。split()。这大约比rawParse慢20%

    lineReaderParseParallel:这与lineReaderParse相同,但它使用多个线程。这是所有情况下最快的选择

    niofilessparse:使用java。尼奥。文件夹。文件夹。行()

    NIOSyncParse:使用带有完成处理程序和线程池的AsynchronousFileChannel

    nioMemoryMappedParse:使用内存映射文件。这确实是一个糟糕的想法,它会导致执行时间比任何其他实现至少长三倍

    这是在四核i7和SSD驱动器上读取204个文件(每个文件4 MB)的平均时间。文件是动态生成的,以避免磁盘缓存

    rawParse                11.10 sec
    lineReaderParse         13.86 sec
    lineReaderParseParallel  6.00 sec
    nioFilesParse           13.52 sec
    nioAsyncParse           16.06 sec
    nioMemoryMappedParse    37.68 sec
    

    我发现在SSD上运行或使用硬盘驱动器作为SSD运行之间的差异比我预期的要小,大约快15%。这可能是因为文件是在未分段的HDD上生成的,并且它们是按顺序读取的,因此旋转驱动器几乎可以像SSD一样运行

    我对nioAsyncParse实现的低性能感到惊讶。要么我以错误的方式实现了某些东西,要么使用NIO和完成处理程序的多线程实现与使用java的单线程实现的性能相同(甚至更差)。io API。此外,使用CompletionHandler的异步解析在代码行中要长得多,并且比在旧流上的直接实现更难正确实现

    现在,这六个实现后面跟着一个包含它们的类,再加上一个可参数化的main()方法,该方法允许处理文件数量、文件大小和并发度。请注意,文件大小的变化为正负20%。这是为了避免由于所有文件大小完全相同而产生任何影响

    rawParse

    public void rawParse(final String targetDir, final int numberOfFiles) throws IOException, ParseException {
        overrunCount = 0;
        final int dl = (int) ';';
        StringBuffer lineBuffer = new StringBuffer(1024);
        for (int f=0; f<numberOfFiles; f++) {
            File fl = new File(targetDir+filenamePreffix+String.valueOf(f)+".txt");
            FileInputStream fin = new FileInputStream(fl);
            BufferedInputStream bin = new BufferedInputStream(fin);
            int character;
            while((character=bin.read())!=-1) {
                if (character==dl) {
    
                    // Here is where something is done with each line
                    doSomethingWithRawLine(lineBuffer.toString());
                    lineBuffer.setLength(0);
                }
                else {
                    lineBuffer.append((char) character);
                }
            }
            bin.close();
            fin.close();
        }
    }
    
    public final void doSomethingWithRawLine(String line) throws ParseException {
        // What to do for each line
        int fieldNumber = 0;
        final int len = line.length();
        StringBuffer fieldBuffer = new StringBuffer(256);
        for (int charPos=0; charPos<len; charPos++) {
            char c = line.charAt(charPos);
            if (c==DL0) {
                String fieldValue = fieldBuffer.toString();
                if (fieldValue.length()>0) {
                    switch (fieldNumber) {
                        case 0:
                            Date dt = fmt.parse(fieldValue);
                            fieldNumber++;
                            break;
                        case 1:
                            double d = Double.parseDouble(fieldValue);
                            fieldNumber++;
                            break;
                        case 2:
                            int t = Integer.parseInt(fieldValue);
                            fieldNumber++;
                            break;
                        case 3:
                            if (fieldValue.equals("overrun"))
                                overrunCount++;
                            break;
                    }
                }
                fieldBuffer.setLength(0);
            }
            else {
                fieldBuffer.append(c);
            }
        }
    }
    

    lineReaderParse

    public void lineReaderParse(final String targetDir, final int numberOfFiles) throws IOException, ParseException {
        String line;
        for (int f=0; f<numberOfFiles; f++) {
            File fl = new File(targetDir+filenamePreffix+String.valueOf(f)+".txt");
            FileReader frd = new FileReader(fl);
            BufferedReader brd = new BufferedReader(frd);
    
            while ((line=brd.readLine())!=null)
                doSomethingWithLine(line);
            brd.close();
            frd.close();
        }
    }
    
    public final void doSomethingWithLine(String line) throws ParseException {
        // Example of what to do for each line
        String[] fields = line.split(";");
        Date dt = fmt.parse(fields[0]);
        double d = Double.parseDouble(fields[1]);
        int t = Integer.parseInt(fields[2]);
        if (fields[3].equals("overrun"))
            overrunCount++;
    }
    

    lineReaderParseParallel

    public void lineReaderParseParallel(final String targetDir, final int numberOfFiles, final int degreeOfParalelism) throws IOException, ParseException, InterruptedException {
        Thread[] pool = new Thread[degreeOfParalelism];
        int batchSize = numberOfFiles / degreeOfParalelism;
        for (int b=0; b<degreeOfParalelism; b++) {
            pool[b] = new LineReaderParseThread(targetDir, b*batchSize, b*batchSize+b*batchSize);
            pool[b].start();
        }
        for (int b=0; b<degreeOfParalelism; b++)
            pool[b].join();
    }
    
    class LineReaderParseThread extends Thread {
    
        private String targetDir;
        private int fileFrom;
        private int fileTo;
        private DateFormat fmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        private int overrunCounter = 0;
    
        public LineReaderParseThread(String targetDir, int fileFrom, int fileTo) {
            this.targetDir = targetDir;
            this.fileFrom = fileFrom;
            this.fileTo = fileTo;
        }
    
        private void doSomethingWithTheLine(String line) throws ParseException {
            String[] fields = line.split(DL);
            Date dt = fmt.parse(fields[0]);
            double d = Double.parseDouble(fields[1]);
            int t = Integer.parseInt(fields[2]);
            if (fields[3].equals("overrun"))
                overrunCounter++;
        }
    
        @Override
        public void run() {
            String line;
            for (int f=fileFrom; f<fileTo; f++) {
                File fl = new File(targetDir+filenamePreffix+String.valueOf(f)+".txt");
                try {
                FileReader frd = new FileReader(fl);
                BufferedReader brd = new BufferedReader(frd);
                while ((line=brd.readLine())!=null) {
                    doSomethingWithTheLine(line);
                }
                brd.close();
                frd.close();
                } catch (IOException | ParseException ioe) { }
            }
        }
    }
    

    niofilessparse

    public void nioFilesParse(final String targetDir, final int numberOfFiles) throws IOException, ParseException {
        for (int f=0; f<numberOfFiles; f++) {
            Path ph = Paths.get(targetDir+filenamePreffix+String.valueOf(f)+".txt");
            Consumer<String> action = new LineConsumer();
            Stream<String> lines = Files.lines(ph);
            lines.forEach(action);
            lines.close();
        }
    }
    
    
    class LineConsumer implements Consumer<String> {
    
        @Override
        public void accept(String line) {
    
            // What to do for each line
            String[] fields = line.split(DL);
            if (fields.length>1) {
                try {
                    Date dt = fmt.parse(fields[0]);
                }
                catch (ParseException e) {
                }
                double d = Double.parseDouble(fields[1]);
                int t = Integer.parseInt(fields[2]);
                if (fields[3].equals("overrun"))
                    overrunCount++;
            }
        }
    }
    

    NIOSyncParse

    public void nioAsyncParse(final String targetDir, final int numberOfFiles, final int numberOfThreads, final int bufferSize) throws IOException, ParseException, InterruptedException {
        ScheduledThreadPoolExecutor pool = new ScheduledThreadPoolExecutor(numberOfThreads);
        ConcurrentLinkedQueue<ByteBuffer> byteBuffers = new ConcurrentLinkedQueue<ByteBuffer>();
    
        for (int b=0; b<numberOfThreads; b++)
            byteBuffers.add(ByteBuffer.allocate(bufferSize));
    
        for (int f=0; f<numberOfFiles; f++) {
            consumerThreads.acquire();
            String fileName = targetDir+filenamePreffix+String.valueOf(f)+".txt";
            AsynchronousFileChannel channel = AsynchronousFileChannel.open(Paths.get(fileName), EnumSet.of(StandardOpenOption.READ), pool);
            BufferConsumer consumer = new BufferConsumer(byteBuffers, fileName, bufferSize);
            channel.read(consumer.buffer(), 0l, channel, consumer);
        }
        consumerThreads.acquire(numberOfThreads);
    }
    
    
    class BufferConsumer implements CompletionHandler<Integer, AsynchronousFileChannel> {
    
            private ConcurrentLinkedQueue<ByteBuffer> buffers;
            private ByteBuffer bytes;
            private String file;
            private StringBuffer chars;
            private int limit;
            private long position;
            private DateFormat frmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
    
            public BufferConsumer(ConcurrentLinkedQueue<ByteBuffer> byteBuffers, String fileName, int bufferSize) {
                buffers = byteBuffers;
                bytes = buffers.poll();
                if (bytes==null)
                    bytes = ByteBuffer.allocate(bufferSize);
    
                file = fileName;
                chars = new StringBuffer(bufferSize);
                frmt = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
                limit = bufferSize;
                position = 0l;
            }
    
            public ByteBuffer buffer() {
                return bytes;
            }
    
            @Override
            public synchronized void completed(Integer result, AsynchronousFileChannel channel) {
    
                if (result!=-1) {
                    bytes.flip();
                    final int len = bytes.limit();
                    int i = 0;
                    try {
                        for (i = 0; i < len; i++) {
                            byte by = bytes.get();
                            if (by=='\n') {
                                // ***
                                // The code used to process the line goes here
                                chars.setLength(0);
                            }
                            else {
                                    chars.append((char) by);
                            }
                        }
                    }
                    catch (Exception x) {
                        System.out.println(
                            "Caught exception " + x.getClass().getName() + " " + x.getMessage() +
                            " i=" + String.valueOf(i) + ", limit=" + String.valueOf(len) +
                            ", position="+String.valueOf(position));
                    }
    
                    if (len==limit) {
                        bytes.clear();
                        position += len;
                        channel.read(bytes, position, channel, this);
                    }
                    else {
                        try {
                            channel.close();
                        }
                        catch (IOException e) {
                        }
                        consumerThreads.release();
                        bytes.clear();
                        buffers.add(bytes);
                    }
                }
                else {
                    try {
                        channel.close();
                    }
                    catch (IOException e) {
                    }
                    consumerThreads.release();
                    bytes.clear();
                    buffers.add(bytes);
                }
            }
    
            @Override
            public void failed(Throwable e, AsynchronousFileChannel channel) {
            }
    };
    

    所有案例的全面可运行实施

    https://github.com/sergiomt/javaiobenchmark/blob/master/FileReadBenchmark.java

  3. # 3 楼答案

    以下是一个简单的解决方案:

    String content = new String(Files.readAllBytes(Paths.get("sample.txt")));
    

    或读作列表:

    List<String> content = Files.readAllLines(Paths.get("sample.txt"))
    
  4. # 4 楼答案

    ASCII是一个文本文件,因此您可以使用^{}进行读取。Java还支持使用^{}读取二进制文件。如果正在读取的文件很大,那么您可能希望在^{}上使用^{}来提高读取性能

    通过this article了解如何使用Reader

    我还建议你下载并阅读这本名为Thinking In Java的精彩(但免费)书籍

    在Java 7中

    new String(Files.readAllBytes(...))
    

    (docs)

    Files.readAllLines(...)
    

    (docs)

    在Java 8中

    Files.lines(..).forEach(...)
    

    (docs)

  5. # 5 楼答案

    最简单的方法是在Java中使用Scanner类和FileReader对象。简单的例子:

    Scanner in = new Scanner(new FileReader("filename.txt"));
    

    Scanner有几种读取字符串、数字等的方法。。。您可以在Java文档页面上查找有关这方面的更多信息

    例如,将整个内容读入String

    StringBuilder sb = new StringBuilder();
    while(in.hasNext()) {
        sb.append(in.next());
    }
    in.close();
    outString = sb.toString();
    

    此外,如果您需要特定编码,您可以使用此编码而不是FileReader

    new InputStreamReader(new FileInputStream(fileUtf8), StandardCharsets.UTF_8)
    
  6. # 6 楼答案

    我最喜欢的读取小文件的方法是使用BufferedReader和StringBuilder。它非常简单,切中要害(虽然不是特别有效,但在大多数情况下都足够好):

    BufferedReader br = new BufferedReader(new FileReader("file.txt"));
    try {
        StringBuilder sb = new StringBuilder();
        String line = br.readLine();
    
        while (line != null) {
            sb.append(line);
            sb.append(System.lineSeparator());
            line = br.readLine();
        }
        String everything = sb.toString();
    } finally {
        br.close();
    }
    

    有人指出,在Java 7之后,您应该使用try-with-resources(即自动关闭)功能:

    try(BufferedReader br = new BufferedReader(new FileReader("file.txt"))) {
        StringBuilder sb = new StringBuilder();
        String line = br.readLine();
    
        while (line != null) {
            sb.append(line);
            sb.append(System.lineSeparator());
            line = br.readLine();
        }
        String everything = sb.toString();
    }
    

    当我阅读这样的字符串时,我通常希望对每行进行一些字符串处理,所以我选择了这个实现

    不过,如果我真的想将文件读入字符串,我总是使用ApacheCommons IO和类IOUtils。toString()方法。您可以在此处查看源代码:

    http://www.docjar.com/html/api/org/apache/commons/io/IOUtils.java.html

    FileInputStream inputStream = new FileInputStream("foo.txt");
    try {
        String everything = IOUtils.toString(inputStream);
    } finally {
        inputStream.close();
    }
    

    使用Java 7更简单:

    try(FileInputStream inputStream = new FileInputStream("foo.txt")) {     
        String everything = IOUtils.toString(inputStream);
        // do something with everything string
    }