有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

java Camel,使用字段条件拆分带有头的大型XML文件

我正在尝试设置一个Apache Camel路由,它输入一个大的XML文件,然后使用字段条件将负载拆分为两个不同的文件。也就是说,如果一个ID字段以1开头,它会进入一个输出文件,否则会进入另一个输出文件。使用Camel不是必须的,我也研究了XSLT和常规Java选项,但我觉得这应该是可行的

我已经介绍了拆分实际负载,但我在确保每个文件中也包含父节点(包括头)方面遇到了问题。由于文件可能很大,我希望确保有效负载使用流。我觉得我已经在这里读了数百个不同的问题,博客条目等等,几乎每个案例都涉及将整个文件加载到内存中,将文件平均分割为部分,或者仅单独使用有效负载节点

我的原型XML文件如下所示:

<root>
    <header>
        <title>Testing</title>
    </header>
    <orders>
        <order>
            <id>11</id>
            <stuff>One</stuff>
        </order>
        <order>
            <id>20</id>
            <stuff>Two</stuff>
        </order>
        <order>
            <id>12</id>
            <stuff>Three</stuff>
        </order>
    </orders> 
</root>

结果应该是两个文件-条件为真(id以1开头):

<root>
    <header>
        <title>Testing</title>
    </header>
    <orders>
        <order>
            <id>11</id>
            <stuff>One</stuff>
        </order>
        <order>
            <id>12</id>
            <stuff>Three</stuff>
        </order>
    </orders> 
</root>

条件错误:

<root>
    <header>
        <title>Testing</title>
    </header>
    <orders>
        <order>
            <id>20</id>
            <stuff>Two</stuff>
        </order>
    </orders> 
</root>

我的原型路线:

from("file:" + inputFolder)
.log("Processing file ${headers.CamelFileName}")
.split()
    .tokenizeXML("order", "*") // Includes parent in every node
    .streaming()
    .choice()
        .when(body().contains("id>1"))
            .to("direct:ones")
            .stop()
        .otherwise()
            .to("direct:others")
            .stop()
    .end()
.end();

from("direct:ones")
//.aggregate(header("ones"), new StringAggregator()) // missing end condition
.to("file:" + outputFolder + "?fileName=ones-${in.header.CamelFileName}&fileExist=Append");

from("direct:others")
//.aggregate(header("others"), new StringAggregator()) // missing end condition
.to("file:" + outputFolder + "?fileName=others-${in.header.CamelFileName}&fileExist=Append");

除了为每个节点添加父标记(页眉和页脚,如果愿意的话)之外,这是有意的。只使用tokenizeXML中的节点只返回节点本身,但我不知道如何添加页眉和页脚。最好是将父标记流式传输到页眉和页脚属性中,并在拆分前后添加它们

我该怎么做?我是否需要首先标记父标记,这是否意味着将文件流化两次

最后,你可能会注意到结尾的聚合。我不想在写入文件之前聚合每个节点,因为这样做会破坏流式传输的目的,并使整个文件内存不足,但我认为,在写入文件之前聚合多个节点,可以获得一些性能,以减少每个节点写入驱动器时的性能影响。我不确定这样做是否有意义


共 (1) 个答案

  1. # 1 楼答案

    我没法用Camel。或者更确切地说,当使用纯Java提取头文件时,我已经具备了继续操作所需的一切,使拆分和交换回Camel看起来很麻烦。有很多可能的方法可以改进这一点,但这是我分割XML负载的解决方案

    在这两种类型的输出流之间切换并不是很好,但它简化了其他一切的使用。同样值得注意的是,我选择equalsIgnoreCase来检查标记名,尽管XML通常区分大小写。对我来说,这降低了出错的风险。最后,按照普通字符串正则表达式,确保正则表达式使用通配符匹配整个字符串

    /**
     * Splits a XML file's payload into two new files based on a regex condition. The payload is a specific XML tag in the
     * input file that is repeated a number of times. All tags before and after the payload are added to both files in order
     * to keep the same structure.
     * 
     * The content of each payload tag is compared to the regex condition and if true, it is added to the primary output file.
     * Otherwise it is added to the secondary output file. The payload can be empty and an empty payload tag will be added to
     * the secondary output file. Note that the output will not be an unaltered copy of the input as self-closing XML tags are
     * altered to corresponding opening and closing tags.
     * 
     * Data is streamed from the input file to the output files, keeping memory usage small even with large files.
     * 
     * @param inputFilename Path and filename for the input XML file
     * @param outputFilenamePrimary Path and filename for the primary output file
     * @param outputFilenameSecondary Path and filename for the secondary output file
     * @param payloadTag XML tag name of the payload
     * @param payloadParentTag XML tag name of the payload's direct parent
     * @param splitRegex The regex split condition used on the payload content
     * @throws Exception On invalid filenames, missing input, incorrect XML structure, etc.
     */
    public static void splitXMLPayload(String inputFilename, String outputFilenamePrimary, String outputFilenameSecondary, String payloadTag, String payloadParentTag, String splitRegex) throws Exception {
    
        XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance();
        XMLOutputFactory xmlOutputFactory = XMLOutputFactory.newInstance();
        XMLEventReader xmlEventReader = null;
        FileInputStream fileInputStream = null;
        FileWriter fileWriterPrimary = null;
        FileWriter fileWriterSecondary = null;
        XMLEventWriter xmlEventWriterSplitPrimary = null;
        XMLEventWriter xmlEventWriterSplitSecondary = null;
    
        try {
            fileInputStream = new FileInputStream(inputFilename);
            xmlEventReader = xmlInputFactory.createXMLEventReader(fileInputStream);
    
            fileWriterPrimary = new FileWriter(outputFilenamePrimary);
            fileWriterSecondary = new FileWriter(outputFilenameSecondary);
            xmlEventWriterSplitPrimary = xmlOutputFactory.createXMLEventWriter(fileWriterPrimary);
            xmlEventWriterSplitSecondary = xmlOutputFactory.createXMLEventWriter(fileWriterSecondary);
    
            boolean isStart = true;
            boolean isEnd = false;
            boolean lastSplitIsPrimary = true;
    
            while (xmlEventReader.hasNext()) {
                XMLEvent xmlEvent = xmlEventReader.nextEvent();
    
                // Check for start of payload element
                if (!isEnd && xmlEvent.isStartElement()) {
                    StartElement startElement = xmlEvent.asStartElement();
                    if (startElement.getName().getLocalPart().equalsIgnoreCase(payloadTag)) {
                        if (isStart) {
                            isStart = false;
                            // Flush the event writers as we'll use the file writers for the payload
                            xmlEventWriterSplitPrimary.flush();
                            xmlEventWriterSplitSecondary.flush();
                        }
    
                        String order = getTagAsString(xmlEventReader, xmlEvent, payloadTag, xmlOutputFactory);
                        if (order.matches(splitRegex)) {
                            lastSplitIsPrimary = true;
                            fileWriterPrimary.write(order);
                        } else {
                            lastSplitIsPrimary = false;
                            fileWriterSecondary.write(order);
                        }
                    }
                }
                // Check for end of parent tag
                else if (!isStart && !isEnd && xmlEvent.isEndElement()) {
                    EndElement endElement = xmlEvent.asEndElement();
                    if (endElement.getName().getLocalPart().equalsIgnoreCase(payloadParentTag)) {
                        isEnd = true;
                    }
                }
                // Is neither start or end and we're handling payload (most often white space)
                else if (!isStart && !isEnd) {
                    // Add to last split handled
                    if (lastSplitIsPrimary) {
                        xmlEventWriterSplitPrimary.add(xmlEvent);
                        xmlEventWriterSplitPrimary.flush();
                    } else {
                        xmlEventWriterSplitSecondary.add(xmlEvent);
                        xmlEventWriterSplitSecondary.flush();
                    }
                }
    
                // Start and end is added to both files
                if (isStart || isEnd) {
                    xmlEventWriterSplitPrimary.add(xmlEvent);
                    xmlEventWriterSplitSecondary.add(xmlEvent);
                }
            }
    
        } catch (Exception e) {
            logger.error("Error in XML split", e);
            throw e;
        } finally {
            // Close the streams
            try {
                xmlEventReader.close();
            } catch (XMLStreamException e) {
                // ignore
            }
            try {
                xmlEventReader.close();
            } catch (XMLStreamException e) {
                // ignore
            }
            try {
                xmlEventWriterSplitPrimary.close();
            } catch (XMLStreamException e) {
                // ignore
            }
            try {
                xmlEventWriterSplitSecondary.close();
            } catch (XMLStreamException e) {
                // ignore
            }
            try {
                fileWriterPrimary.close();
            } catch (IOException e) {
                // ignore
            }
            try {
                fileWriterSecondary.close();
            } catch (IOException e) {
                // ignore
            }
        }
    }
    
    /**
     * Loops through the events in the {@code XMLEventReader} until the specific XML end tag is found and returns everything
     * contained within the XML tag as a String.
     * 
     * Data is streamed from the {@code XMLEventReader}, however the String can be large depending of the number of children
     * in the XML tag.
     * 
     * @param xmlEventReader The already active reader. The starting tag event is assumed to have already been read
     * @param startEvent The starting XML tag event already read from the {@code XMLEventReader}
     * @param tag The XML tag name used to find the starting XML tag
     * @param xmlOutputFactory Convenience include to avoid creating another factory
     * @return String containing everything between the starting and ending XML tag, the tags themselves included
     * @throws Exception On incorrect XML structure
     */
    private static String getTagAsString(XMLEventReader xmlEventReader, XMLEvent startEvent, String tag, XMLOutputFactory xmlOutputFactory) throws Exception {
        StringWriter stringWriter = new StringWriter();
        XMLEventWriter xmlEventWriter = xmlOutputFactory.createXMLEventWriter(stringWriter);
    
        // Add the start tag
        xmlEventWriter.add(startEvent);
    
        // Add until end tag
        while (xmlEventReader.hasNext()) {
            XMLEvent xmlEvent = xmlEventReader.nextEvent();
    
            // End tag found
            if (xmlEvent.isEndElement() && xmlEvent.asEndElement().getName().getLocalPart().equalsIgnoreCase(tag)) {
                xmlEventWriter.add(xmlEvent);
                xmlEventWriter.close();
                stringWriter.close();
    
                return stringWriter.toString();
            } else {
                xmlEventWriter.add(xmlEvent);
            }
        }
    
        xmlEventWriter.close();
        stringWriter.close();
        throw new Exception("Invalid XML, no closing tag for <" + tag + "> found!");
    }