有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java加速文件读取

我有一个1.7G文件,格式如下:

String Long String Long String Long String Long ... etc

本质上,String是一个键,Long是hashmap中的一个值,我感兴趣的是在运行应用程序中的任何其他内容之前进行初始化

我目前的代码是:

  RandomAccessFile raf=new RandomAccessFile("/home/map.dat","r");
                raf.seek(0);
                while(raf.getFilePointer()!=raf.length()){
                        String name=raf.readUTF();
                        long offset=raf.readLong();
                        map.put(name,offset);
                }

这需要大约12分钟来完成,我相信有更好的方法来完成这一点,所以我将感谢任何帮助或指针

谢谢


是否按照EJP建议更新

谢谢你的建议,我希望这就是你的意思。如果这是错误的,请纠正我

DataInputStream dis=null;
    try{
     dis=new DataInputStream(new BufferedInputStream(new FileInputStream("/home/map.dat")));
     while(true){
       String name=dis.readUTF();
       long offset=dis.readLong();
       map.put(name, offset);
     }
    }catch (EOFException eofe){
      try{
        dis.close();
      }catch (IOException ioe){
        ioe.printStackTrace();
      }
    }

共 (2) 个答案

  1. # 1 楼答案

    我将构造该文件,以便在适当的位置使用它。i、 e.不以这种方式加载。由于您有可变长度的记录,您可以构造每个记录位置的数组,然后按顺序放置键,以便对数据执行二进制搜索。(或者您可以使用自定义哈希表)然后可以使用方法包装此文件,该方法隐藏数据实际上存储在文件中,而不是转换为数据对象

    如果您执行所有这些操作,“加载”阶段将变得多余,您将不需要创建这么多对象


    这是一个很长的例子,但希望能说明什么是可能的

    import vanilla.java.chronicle.Chronicle;
    import vanilla.java.chronicle.Excerpt;
    import vanilla.java.chronicle.impl.IndexedChronicle;
    import vanilla.java.chronicle.tools.ChronicleTest;
    
    import java.io.IOException;
    import java.util.*;
    
    public class Main {
        static final String TMP = System.getProperty("java.io.tmpdir");
    
        public static void main(String... args) throws IOException {
            String baseName = TMP + "/test";
            String[] keys = generateAndSave(baseName, 100 * 1000 * 1000);
    
            long start = System.nanoTime();
            SavedSortedMap map = new SavedSortedMap(baseName);
            for (int i = 0; i < keys.length / 100; i++) {
                long l = map.lookup(keys[i]);
    //            System.out.println(keys[i] + ": " + l);
            }
            map.close();
            long time = System.nanoTime() - start;
    
            System.out.printf("Load of %,d records and lookup of %,d keys took %.3f seconds%n",
                    keys.length, keys.length / 100, time / 1e9);
        }
    
        static SortedMap<String, Long> generateMap(int keys) {
            SortedMap<String, Long> ret = new TreeMap<>();
            while (ret.size() < keys) {
                long n = ret.size();
                String key = Long.toString(n);
                while (key.length() < 9)
                    key = '0' + key;
                ret.put(key, n);
            }
            return ret;
        }
    
        static void saveData(SortedMap<String, Long> map, String baseName) throws IOException {
            Chronicle chronicle = new IndexedChronicle(baseName);
            Excerpt excerpt = chronicle.createExcerpt();
            for (Map.Entry<String, Long> entry : map.entrySet()) {
                excerpt.startExcerpt(2 + entry.getKey().length() + 8);
                excerpt.writeUTF(entry.getKey());
                excerpt.writeLong(entry.getValue());
                excerpt.finish();
            }
            chronicle.close();
        }
    
        static class SavedSortedMap {
            final Chronicle chronicle;
            final Excerpt excerpt;
            final String midKey;
            final long size;
    
            SavedSortedMap(String baseName) throws IOException {
                chronicle = new IndexedChronicle(baseName);
                excerpt = chronicle.createExcerpt();
                size = chronicle.size();
                excerpt.index(size / 2);
                midKey = excerpt.readUTF();
            }
    
            // find exact match or take the value after.
            public long lookup(CharSequence key) {
                if (compareTo(key, midKey) < 0)
                    return lookup0(0, size / 2, key);
                return lookup0(size / 2, size, key);
            }
    
            private final StringBuilder tmp = new StringBuilder();
    
            private long lookup0(long from, long to, CharSequence key) {
                long mid = (from + to) >>> 1;
                excerpt.index(mid);
                tmp.setLength(0);
                excerpt.readUTF(tmp);
                if (to - from <= 1)
                    return excerpt.readLong();
                int cmp = compareTo(key, tmp);
                if (cmp < 0)
                    return lookup0(from, mid, key);
                if (cmp > 0)
                    return lookup0(mid, to, key);
                return excerpt.readLong();
            }
    
            public static int compareTo(CharSequence a, CharSequence b) {
                int lim = Math.min(a.length(), b.length());
                for (int k = 0; k < lim; k++) {
                    char c1 = a.charAt(k);
                    char c2 = b.charAt(k);
                    if (c1 != c2)
                        return c1 - c2;
                }
                return a.length() - b.length();
            }
    
            public void close() {
                chronicle.close();
            }
        }
    
        private static String[] generateAndSave(String baseName, int keyCount) throws IOException {
            SortedMap<String, Long> map = generateMap(keyCount);
            saveData(map, baseName);
            ChronicleTest.deleteOnExit(baseName);
    
            String[] keys = map.keySet().toArray(new String[map.size()]);
            Collections.shuffle(Arrays.asList(keys));
            return keys;
        }
    }
    

    生成2GB的原始数据并执行一百万次查找。它的编写方式使得加载和查找使用的堆很少。(<;<;1MB)

    ls -l /tmp/test*
    -rw-rw   1 peter peter 2013265920 Dec 11 13:23 /tmp/test.data
    -rw-rw   1 peter peter  805306368 Dec 11 13:23 /tmp/test.index
    
    /tmp/test created.
    /tmp/test, size=100000000
    Load of 100,000,000 records and lookup of 1,000,000 keys took 10.945 seconds
    

    每次查找使用哈希表会更快,因为它是O(1)而不是O(ln),但实现起来更复杂

  2. # 2 楼答案

    1. 使用围绕FileInputStream的BufferedInputStream包装的DataInputStream

    2. 与每次迭代至少四次系统调用、检查长度和当前大小并执行谁知道有多少次读取来获取字符串和long不同,只需调用readUTF()和readLong(),直到获得EOFEException