有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java是使用GSON JsonReader处理大型字段的最佳方法

我要买一个java。lang.OutOfMemoryError:即使使用GSON流,也会占用Java堆空间

{"result":"OK","base64":"JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC...."}

base64最长可达200Mb。GSON占用的内存远不止此,(3GB)当我尝试将base64存储在变量中时,我得到:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.util.Arrays.copyOf(Arrays.java:2367)
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
    at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
    at java.lang.StringBuilder.append(StringBuilder.java:204)
    at com.google.gson.stream.JsonReader.nextQuotedValue(JsonReader.java:1014)
    at com.google.gson.stream.JsonReader.nextString(JsonReader.java:815)

处理此类字段的最佳方法是什么


共 (1) 个答案

  1. # 1 楼答案

    得到OutOfMemoryError的原因是GSON nextString()返回一个字符串,该字符串在使用StringBuilder构建非常大的字符串期间聚合。当您面临这样的问题时,您必须处理中间数据,因为没有其他选择。不幸的是,GSON不允许您以任何方式处理大量文本

    不确定是否可以更改响应负载,但如果不能,则可能需要实现自己的JSON读取器,或者“黑客”现有的JsonReader,使其以流式方式工作。下面的示例基于GSON 2.5,大量使用反射,因为JsonReader非常小心地隐藏其状态

    EnhancedGson25JsonReader。爪哇

    final class EnhancedGson25JsonReader
            extends JsonReader {
    
        // A listener to accept the internal character buffers.
        // Accepting a single string built on such buffers is total memory waste as well.
        interface ISlicedStringListener {
    
            void accept(char[] buffer, int start, int length)
                    throws IOException;
    
        }
    
        // The constants can be just copied
    
        /** @see JsonReader#PEEKED_NONE */
        private static final int PEEKED_NONE = 0;
    
        /** @see JsonReader#PEEKED_SINGLE_QUOTED */
        private static final int PEEKED_SINGLE_QUOTED = 8;
    
        /** @see JsonReader#PEEKED_DOUBLE_QUOTED */
        private static final int PEEKED_DOUBLE_QUOTED = 9;
    
        // Here is a bunch of spies made to "spy" for the parent's class state
    
        private final FieldSpy<Integer> peeked;
        private final MethodSpy<Integer> doPeek;
        private final MethodSpy<Integer> getLineNumber;
        private final MethodSpy<Integer> getColumnNumber;
        private final FieldSpy<char[]> buffer;
        private final FieldSpy<Integer> pos;
        private final FieldSpy<Integer> limit;
        private final MethodSpy<Character> readEscapeCharacter;
        private final FieldSpy<Integer> lineNumber;
        private final FieldSpy<Integer> lineStart;
        private final MethodSpy<Boolean> fillBuffer;
        private final MethodSpy<IOException> syntaxError;
        private final FieldSpy<Integer> stackSize;
        private final FieldSpy<int[]> pathIndices;
    
        private EnhancedJsonReader(final Reader reader)
                throws NoSuchFieldException, NoSuchMethodException {
            super(reader);
            peeked = spyField(JsonReader.class, this, "peeked");
            doPeek = spyMethod(JsonReader.class, this, "doPeek");
            getLineNumber = spyMethod(JsonReader.class, this, "getLineNumber");
            getColumnNumber = spyMethod(JsonReader.class, this, "getColumnNumber");
            buffer = spyField(JsonReader.class, this, "buffer");
            pos = spyField(JsonReader.class, this, "pos");
            limit = spyField(JsonReader.class, this, "limit");
            readEscapeCharacter = spyMethod(JsonReader.class, this, "readEscapeCharacter");
            lineNumber = spyField(JsonReader.class, this, "lineNumber");
            lineStart = spyField(JsonReader.class, this, "lineStart");
            fillBuffer = spyMethod(JsonReader.class, this, "fillBuffer", int.class);
            syntaxError = spyMethod(JsonReader.class, this, "syntaxError", String.class);
            stackSize = spyField(JsonReader.class, this, "stackSize");
            pathIndices = spyField(JsonReader.class, this, "pathIndices");
        }
    
        static EnhancedJsonReader getEnhancedGson25JsonReader(final Reader reader) {
            try {
                return new EnhancedJsonReader(reader);
            } catch ( final NoSuchFieldException | NoSuchMethodException ex ) {
                throw new RuntimeException(ex);
            }
        }
    
        // This method has been copied and reworked from the nextString() implementation
    
        void nextSlicedString(final ISlicedStringListener listener)
                throws IOException {
            int p = peeked.get();
            if ( p == PEEKED_NONE ) {
                p = doPeek.get();
            }
            switch ( p ) {
            case PEEKED_SINGLE_QUOTED:
                nextQuotedSlicedValue('\'', listener);
                break;
            case PEEKED_DOUBLE_QUOTED:
                nextQuotedSlicedValue('"', listener);
                break;
            default:
                throw new IllegalStateException("Expected a string but was " + peek()
                        + " at line " + getLineNumber.get()
                        + " column " + getColumnNumber.get()
                        + " path " + getPath()
                );
            }
            peeked.accept(PEEKED_NONE);
            pathIndices.get()[stackSize.get() - 1]++;
        }
    
        // The following method is also a copy-paste that was patched for the "spies".
        // It's, in principle, the same as the source one, but it has one more buffer singleCharBuffer
        // in order not to add another method to the ISlicedStringListener interface (enjoy lamdbas as much as possible).
        // Note that the main difference between these two methods is that this one
        // does not aggregate a single string value, but just delegates the internal
        // buffers to call-sites, so the latter ones might do anything with the buffers.
    
        /**
         * @see JsonReader#nextQuotedValue(char)
         */
        private void nextQuotedSlicedValue(final char quote, final ISlicedStringListener listener)
                throws IOException {
            final char[] buffer = this.buffer.get();
            final char[] singleCharBuffer = new char[1];
            while ( true ) {
                int p = pos.get();
                int l = limit.get();
                int start = p;
                while ( p < l ) {
                    final int c = buffer[p++];
                    if ( c == quote ) {
                        pos.accept(p);
                        listener.accept(buffer, start, p - start - 1);
                        return;
                    } else if ( c == '\\' ) {
                        pos.accept(p);
                        listener.accept(buffer, start, p - start - 1);
                        singleCharBuffer[0] = readEscapeCharacter.get();
                        listener.accept(singleCharBuffer, 0, 1);
                        p = pos.get();
                        l = limit.get();
                        start = p;
                    } else if ( c == '\n' ) {
                        lineNumber.accept(lineNumber.get() + 1);
                        lineStart.accept(p);
                    }
                }
                listener.accept(buffer, start, p - start);
                pos.accept(p);
                if ( !fillBuffer.apply(just1) ) {
                    throw syntaxError.apply(justUnterminatedString);
                }
            }
        }
    
        // Save some memory
    
        private static final Object[] just1 = { 1 };
        private static final Object[] justUnterminatedString = { "Unterminated string" };
    
    }
    

    战地间谍。爪哇

    final class FieldSpy<T>
            implements Supplier<T>, Consumer<T> {
    
        private final Object instance;
        private final Field field;
    
        private FieldSpy(final Object instance, final Field field) {
            this.instance = instance;
            this.field = field;
        }
    
        static <T> FieldSpy<T> spyField(final Class<?> declaringClass, final Object instance, final String fieldName)
                throws NoSuchFieldException {
            final Field field = declaringClass.getDeclaredField(fieldName);
            field.setAccessible(true);
            return new FieldSpy<>(instance, field);
        }
    
        @Override
        public T get() {
            try {
                @SuppressWarnings("unchecked")
                final T value = (T) field.get(instance);
                return value;
            } catch ( final IllegalAccessException ex ) {
                throw new RuntimeException(ex);
            }
        }
    
        @Override
        public void accept(final T value) {
            try {
                field.set(instance, value);
            } catch ( final IllegalAccessException ex ) {
                throw new RuntimeException(ex);
            }
        }
    
    }
    

    MethodSpy。爪哇

    final class MethodSpy<T>
            implements Function<Object[], T>, Supplier<T> {
    
        private static final Object[] emptyObjectArray = {};
    
        private final Object instance;
        private final Method method;
    
        private MethodSpy(final Object instance, final Method method) {
            this.instance = instance;
            this.method = method;
        }
    
        static <T> MethodSpy<T> spyMethod(final Class<?> declaringClass, final Object instance, final String methodName, final Class<?>... parameterTypes)
                throws NoSuchMethodException {
            final Method method = declaringClass.getDeclaredMethod(methodName, parameterTypes);
            method.setAccessible(true);
            return new MethodSpy<>(instance, method);
        }
    
        @Override
        public T get() {
        // my javac generates useless new Object[0] if no args passed
            return apply(emptyObjectArray);
        }
    
        @Override
        public T apply(final Object[] arguments) {
            try {
                @SuppressWarnings("unchecked")
                final T value = (T) method.invoke(instance, arguments);
                return value;
            } catch ( final IllegalAccessException | InvocationTargetException ex ) {
                throw new RuntimeException(ex);
            }
        }
    
    }
    

    HugeJsonReaderDemo。爪哇

    下面是一个演示,它使用该方法读取一个巨大的JSON并将其字符串值重定向到另一个文件

    public static void main(final String... args)
            throws IOException {
        try ( final EnhancedGson25JsonReader input = getEnhancedGson25JsonReader(new InputStreamReader(new FileInputStream("./huge.json")));
              final Writer output = new OutputStreamWriter(new BufferedOutputStream(new FileOutputStream("./huge.json.STRINGS"))) ) {
            while ( input.hasNext() ) {
                final JsonToken token = input.peek();
                switch ( token ) {
                case BEGIN_OBJECT:
                    input.beginObject();
                    break;
                case NAME:
                    input.nextName();
                    break;
                case STRING:
                    input.nextSlicedString(output::write);
                    break;
                default:
                    throw new AssertionError(token);
                }
            }
        }
    }
    

    我成功地将上面的字段提取到一个文件中。输入文件的长度为544 MB(570 425 371字节),由以下JSON块生成:

    • {"result":"OK","base64":"
    • JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC×16777216(2^24)
    • "}

    结果是(因为我只是将所有字符串重定向到该文件):

    • OK
    • JVBERi0xLjQKJeLjz9MKMSAwIG9iago8PC×16777216(2^24)

    我认为你面临着一个非常有趣的问题。如果GSON团队能够就可能的API增强提供一些反馈,那就太好了