用于构建语言规则的dsl

rita-dsl的Python项目详细描述


丽塔DSL

这是一种语言,松散地基于语言Apache UIMA RUTA,专注于编写手动语言规则,这些规则编译成spaCy兼容的模式。这些模式可以用于manual NER,也可以用于其他过程,如重新编程和纯匹配

文档

快速启动

通过pip install rita-dsl

安装

您可以通过创建扩展名为*.rita

的文件来开始定义规则。

下面是一个完整的示例,可以用作参考点

cars = LOAD("examples/cars.txt") # Load items from file
colors = {"red", "green", "blue", "white", "black"} # Declare items inline

{IN_LIST(colors), WORD("car")} -> MARK("CAR_COLOR") # If first token is in list `colors` and second one is word `car`, label it

{IN_LIST(cars), WORD+} -> MARK("CAR_MODEL") # If first token is in list `cars` and follows by 1..N words, label it

{ENTITY("PERSON"), LEMMA("like"), WORD} -> MARK("LIKED_ACTION") # If first token is Person, followed by any word which has lemma `like`, label it

现在您可以编译这些规则rita -f <your-file>.rita output.jsonl

并加载到spacy:

importspacyfromspacy.pipelineimportEntityRulernlp=spacy.load("en")ruler=EntityRuler(nlp,overwrite_ents=True)ruler.from_disk("output.jsonl")nlp.add_pipe(ruler)

每次用spacy解析文本时,它都会运行通常的工作流并应用这些规则

text="""Johny Silver was driving a red car. It was BMW X6 Mclass. Johny likes driving it very much."""doc=nlp(text)entities=[(e.text,e.label_)foreindoc.ents]print(entities)assertentities[0]==("Johny Silver","PERSON")# Normal NERassertentities[1]==("red car","CAR_COLOR")# Our first ruleassertentities[2]==("BMW X6 Mclass","CAR_MODEL")# Our second ruleassertentities[3]==("Johny likes driving","LIKED_ACTION")# Our third rule

另外,如果rita在项目中用作依赖项,并且您更喜欢动态编译规则,则可以这样做:

importritaimportspacyfromspacy.pipelineimportEntityRulernlp=spacy.load("en")ruler=EntityRuler(nlp,overwrite_ents=True)patterns=rita.compile("examples/color-car.rita")ruler.add_patterns(patterns)nlp.add_pipe(ruler)

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
正在从节点解密字符串。Java中的js?   java未来超时和IO超时   java apache camel多播以异步方式执行聚合   java需要710次尝试在Android和Raspberry Pi之间建立socket连接   在Java中使用“+,”循环绘制形状   java安卓:如何计算两点之间的距离   java多线程Hello World   刷新后缓存共享变量的java Freemarker模板   java我试图通过迭代用整数填充数组,但我得到了ArrayIndexOutOfBoundsException   JNI C++到java 32位图像的不正确显示   java哪个更快:克隆还是使用流?   java cache2k和Generic T不能很好地结合在一起   java如何在读取文件之前更新文件内容?   java如何在GWT中从JSNI方法调用JSNI方法   将MySQL类型文本映射到Java Hibernate的类型   java如何按长度对字符串排序   java RecyclerView:在滚动期间不断更改数据