bash将对jar文件| cut | awk和java程序的调用集成到一个统一进程中
我目前正在执行一个相当复杂的数据预处理操作,这是:
cat large_file.txt \ | ./reverb -q | cut --fields=16,17,18 | awk -F\\t -vq="'" 'function quote(token) { gsub(q, "\\"q, token); return q token q } { print quote($2) "(" quote($3) ", " quote($1) ")." }' >> output.txt
正如你所看到的,这是相当复杂的,首先是猫,然后是那个/reverb,然后进行切割,最后进行awk
接下来,我想将输出传递给java程序,即:
public static void main(String[] args) throws IOException
{
Ontology ontology = new Ontology();
BufferedReader br = new BufferedReader(new FileReader("/home/matthias/Workbench/SUTD/2_January/Prolog/horn_data_test.pl"));
Pattern p = Pattern.compile("'(.*?)'\\('(.*?)','(.*?)'\\)\\.");
String line;
while ((line = br.readLine()) != null)
{
Matcher m = p.matcher(line);
if( m.matches() )
{
String verb = m.group(1);
String object = m.group(2);
String subject = m.group(3);
ontology.addSentence( new Sentence( verb, object, subject ) );
}
}
for( String joint: ontology.getJoints() )
{
for( Integer subind: ontology.getSubjectIndices( joint ) )
{
Sentence xaS = ontology.getSentence( subind );
for( Integer obind: ontology.getObjectIndices( joint ) )
{
Sentence yOb = ontology.getSentence( obind );
Sentence s = new Sentence( xaS.getVerb(),
xaS.getObject(),
yOb.getSubject() );
System.out.println( s );
}
}
}
}
将这一过程合成为一个连贯操作的最佳方式是什么?理想情况下,我只想指定输入文件和输出文件并运行一次。就目前而言,整个过程相当混乱
也许我可以把所有这些调用放到bash脚本中?这可行吗
输入最初包含英语句子,每行一句,即:
Oranges are delicious and contain vitamin c.
Brilliant scientists learned that we can prevent scurvy by imbibing vitamin c.
Colorless green ideas sleep furiously.
...
预处理使其看起来像这样:
'contain'('vitamin c','oranges').
'prevent'('scurvy','vitamin c').
'sleep'('furiously','ideas').
...
java程序用于通过推理学习“规则”,因此如果处理后的数据产生'contain'('vitamin c','oranges').
&'prevent'('scurvy','vitamin c').
那么java代码将发出'prevent'('scurvy','oranges').
# 1 楼答案
我看了混响的源代码,我认为很容易调整它来产生你想要的输出。如果您查看reverb类命令行reverb。java,它有以下两种方法:
第一个方法按句子调用,并进行提取。然后它调用第二个方法将制表符分隔的值打印到输出流中。我想您所要做的就是实现自己版本的第二个方法“printExtr()”