有 Java 编程相关的问题?

你可以在下面搜索框中键入要查询的问题!

bash将对jar文件| cut | awk和java程序的调用集成到一个统一进程中

我目前正在执行一个相当复杂的数据预处理操作,这是:

cat large_file.txt \ | ./reverb -q | cut --fields=16,17,18 | awk -F\\t -vq="'" 'function quote(token) { gsub(q, "\\"q, token); return q token q } { print quote($2) "(" quote($3) ", " quote($1) ")." }' >> output.txt

正如你所看到的,这是相当复杂的,首先是猫,然后是那个/reverb,然后进行切割,最后进行awk

接下来,我想将输出传递给java程序,即:

public static void main(String[] args) throws IOException 
{
    Ontology ontology = new Ontology();
    BufferedReader br = new BufferedReader(new FileReader("/home/matthias/Workbench/SUTD/2_January/Prolog/horn_data_test.pl"));
    Pattern p = Pattern.compile("'(.*?)'\\('(.*?)','(.*?)'\\)\\."); 
    String line;
    while ((line = br.readLine()) != null) 
    {
        Matcher m = p.matcher(line);
        if( m.matches() ) 
        {
            String verb    = m.group(1);
            String object  = m.group(2);
            String subject = m.group(3);
            ontology.addSentence( new Sentence( verb, object, subject ) );
        }
    }

    for( String joint: ontology.getJoints() )
    {
        for( Integer subind: ontology.getSubjectIndices( joint ) )
        {
            Sentence xaS = ontology.getSentence( subind );
            for( Integer obind: ontology.getObjectIndices( joint ) )
            {
                Sentence yOb = ontology.getSentence( obind );
                Sentence s = new Sentence( xaS.getVerb(),
                                           xaS.getObject(),
                                           yOb.getSubject() );
                System.out.println( s );
            }
        }
    }
}   

将这一过程合成为一个连贯操作的最佳方式是什么?理想情况下,我只想指定输入文件和输出文件并运行一次。就目前而言,整个过程相当混乱

也许我可以把所有这些调用放到bash脚本中?这可行吗

输入最初包含英语句子,每行一句,即:

Oranges are delicious and contain vitamin c.
Brilliant scientists learned that we can prevent scurvy by imbibing vitamin c.
Colorless green ideas sleep furiously.
...

预处理使其看起来像这样:

'contain'('vitamin c','oranges').
'prevent'('scurvy','vitamin c').
'sleep'('furiously','ideas').
...

java程序用于通过推理学习“规则”,因此如果处理后的数据产生'contain'('vitamin c','oranges').&'prevent'('scurvy','vitamin c').那么java代码将发出'prevent'('scurvy','oranges').


共 (1) 个答案

  1. # 1 楼答案

    我看了混响的源代码,我认为很容易调整它来产生你想要的输出。如果您查看reverb类命令行reverb。java,它有以下两种方法:

    private void extractFromSentReader(ChunkedSentenceReader reader)
            throws ExtractorException {
        long start;
    
        ChunkedSentenceIterator sentenceIt = reader.iterator();
    
        while (sentenceIt.hasNext()) {
            // get the next chunked sentence
            ChunkedSentence sent = sentenceIt.next();
            chunkTime += sentenceIt.getLastComputeTime();
    
            numSents++;
    
            // make the extractions
            start = System.nanoTime();
            Iterable<ChunkedBinaryExtraction> extractions = extractor
                    .extract(sent);
            extractTime += System.nanoTime() - start;
    
            for (ChunkedBinaryExtraction extr : extractions) {
                numExtrs++;
    
                // run the confidence function
                start = System.nanoTime();
                double conf = getConf(extr);
                confTime += System.nanoTime() - start;
    
                NormalizedBinaryExtraction extrNorm = normalizer
                        .normalize(extr);
                printExtr(extrNorm, conf);
            }
            if (numSents % messageEvery == 0)
                summary();
        }
    }
    
    private void printExtr(NormalizedBinaryExtraction extr, double conf) {
        String arg1 = extr.getArgument1().toString();
        String rel = extr.getRelation().toString();
        String arg2 = extr.getArgument2().toString();
    
        ChunkedSentence sent = extr.getSentence();
        String toks = sent.getTokensAsString();
        String pos = sent.getPosTagsAsString();
        String chunks = sent.getChunkTagsAsString();
        String arg1Norm = extr.getArgument1Norm().toString();
        String relNorm = extr.getRelationNorm().toString();
        String arg2Norm = extr.getArgument2Norm().toString();
    
        Range arg1Range = extr.getArgument1().getRange();
        Range relRange = extr.getRelation().getRange();
        Range arg2Range = extr.getArgument2().getRange();
        String a1s = String.valueOf(arg1Range.getStart());
        String a1e = String.valueOf(arg1Range.getEnd());
        String rs = String.valueOf(relRange.getStart());
        String re = String.valueOf(relRange.getEnd());
        String a2s = String.valueOf(arg2Range.getStart());
        String a2e = String.valueOf(arg2Range.getEnd());
    
        String row = Joiner.on("\t").join(
                new String[] { currentFile, String.valueOf(numSents), arg1,
                        rel, arg2, a1s, a1e, rs, re, a2s, a2e,
                        String.valueOf(conf), toks, pos, chunks, arg1Norm,
                        relNorm, arg2Norm });
    
        System.out.println(row);
    }
    

    第一个方法按句子调用,并进行提取。然后它调用第二个方法将制表符分隔的值打印到输出流中。我想您所要做的就是实现自己版本的第二个方法“printExtr()”