在python中重新实现hfst优化的查找。包含原始hfst优化查找的包装

hfstol的Python项目详细描述


hfstol

travis-badgecode-covPyPI pyversions

python中的hfst优化查找

pip install hfstol

以下所有示例都基于两个.hfstol文件

分别是:crk-descriptive-analyzer.hfstol crk-normative-generator.hfstol

使用

使用crk-descriptive-analyzer.hfstol

的示例
fromhfstolimportHFSTOLhfst=HFSTOL.from_file('crk-descriptive-analyzer.hfstol')hfst.feed('niska')# returns: # (('niska', '+N', '+A', '+Sg'), ('niska', '+N', '+A', '+Obv'))hfst.feed_in_bulk(['niska','kinipânânaw'])# returns: # {'niska': {('niska', '+N', '+A', '+Obv'), ('niska', '+N', '+A', '+Sg')}, 'kinipânânaw': {('nipâw', '+V', '+AI', '+Ind', '+Prs', '+12Pl')}}hfst.feed_in_bulk_fast(['niska','kinipânânaw'])# returns:# {'niska': {'niska+N+A+Obv', 'niska+N+A+Sg'}, 'kinipânânaw': {'nipâw+V+AI+Ind+Prs+12Pl'}}

例如crk-normative-generator.hfstol

fromhfstolimportHFSTOLhfst=HFSTOL.from_file('crk-normative-generator.hfstol')hfst.feed('niska+N+A+Pl')# returns: # (('niskak',),)hfst.feed_in_bulk(["niska+N+A+Pl",'nipâw+V+AI+Ind+Prs+12Pl'])# returns: # {'niska+N+A+Pl': {('niskak',)}, 'nipâw+V+AI+Ind+Prs+12Pl': {('kinipânânaw',), ('kinipânaw',)}}hfst.feed_in_bulk_fast(["niska+N+A+Pl",'nipâw+V+AI+Ind+Prs+12Pl'],multi_process=4)# returns:# {'niska+N+A+Pl': {'niskak'}, 'nipâw+V+AI+Ind+Prs+12Pl': {'kinipânânaw', 'kinipânaw'}}

要查看包含边缘情况的全面a p i行为,请参见this test file(如果i feed('absolute garbage'))如何

API签名

# HFSTOL.from_file@classmethoddeffrom_file(cls,filename:Union[str,pathlib.Path]):"""    :param filename: the `.hfstol` file    :return: an `HFSTOL` instance, which you can use to convert surface forms to deep forms    """pass# HFSTOL.feeddeffeed(self,surface_form:str,concat:bool=True)->Tuple[Tuple[str,...],...]:"""    feed surface form to hfst    :param surface_form: the surface form    :param concat: whether to concatenate single characters        example output for `surface_form` = 'niskak', with `crk-descriptive-analyzer.hfstol`        - True: (('niska', '+N', '+A', '+Pl'), ('nîskâw', '+V', '+II', '+II', '+Cnj', '+Prs', '+3Sg'))        - False: (('n', 'i', 's', 'k', 'a', '+N', '+A', '+Pl'), ('n', 'î', 's', 'k', 'â', 'w', '+V', '+II', '+II', '+Cnj', '+Prs', '+3Sg'))        example output for `surface_form` = 'niska+N+A+Pl' with `crk-normative-generator.hfstol`        - True: (('niskak',),)        - False: (('n', 'i', 's', 'k', 'a', 'k'),)        example output for `surface_form` = 'niska+N+A+Pl' with `crk-normative-generator.hfstol` (an inflection that has two spellings)        - True: (('kinipânaw',), ('kinipânânaw',))        -False: (('k', 'i', 'n', 'i', 'p', 'â', 'n', 'a', 'w'), ('k', 'i', 'n', 'i', 'p', 'â', 'n', 'â', 'n', 'a', 'w'))    """pass# HFSTOL.feed_in_bulk   deffeed_in_bulk(self,surface_forms:List[str],concat=True)->Dict[str,Set[Tuple[str,...]]]:"""    feed a multiple of surface forms to hfst at once    :param surface_forms:    :return: a dictionary with keys being each surface form fed in, values being their corresponding deep forms    """pass# HFSTOL.feed_in_bulk_fastdeffeed_in_bulk_fast(self,strings:Iterable[str],multi_process:int=1)->Dict[str,Set[str]]:"""    calls `hfstol-optimized-lookup`. Evaluation is magnitudes faster. Note the generated symbols will all be all concatenated.    e.g. instead of ['n', 'i', 's', 'k', 'a', '+N', '+A', '+Pl'] it returns ['niska+N+A+Pl']    :keyword multi_process: Defaults to 1. Specify how many parallel processes you want to speed up computation. A rule is to have processes at most your machine core count.    """

使用feed_in_bulk_fast

feed_in_bulk_fast调用编译的c代码,其速度可能比feed_in_bulk快100倍。

它需要安装hfst-optimized-lookup。版本1.2经过测试可以工作。对于linux系统,安装可以像sudo apt install hfst一样简单。对于其他系统,请参见installation guide

如果找不到hfst-optimized-lookup,则调用feed_in_bulk_fast抛出ImportError

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java用变化的替换字符串替换子字符串   从数据库中断中恢复的oracle Java DAL?   Android/Java页边距位于左/右/底部   java如何用相同的源代码构建不同的APK?(我发现了一个错误)   java正则表达式,仅当字符串以一行中的3个数字开头时才匹配第一个数字   使用以xml为输入的给定端点调用java中的rest-ful web服务?   java长字符串转换为UTF8引发异常   java如何使用截取方法获取ArrayList   java将计算列添加到可观察列表中   正则表达式如何在java正则表达式中使用组?   java正则表达式只接受字母表和空格,不允许在字符串的开头和结尾使用空格   java简单onclick按钮在安卓中不起作用   java如何在Spring中只实现Crudepository的特定方法?   java无法使用json对象NPE读取jsonarray   java我可以添加maven依赖项,这些依赖项被打包为除此之外的任何东西。罐子