在python中重新实现hfst优化的查找。包含原始hfst优化查找的包装

hfstol的Python项目详细描述


hfstol

travis-badgecode-covPyPI pyversions

python中的hfst优化查找

pip install hfstol

以下所有示例都基于两个.hfstol文件

分别是:crk-descriptive-analyzer.hfstol crk-normative-generator.hfstol

使用

使用crk-descriptive-analyzer.hfstol

的示例
fromhfstolimportHFSTOLhfst=HFSTOL.from_file('crk-descriptive-analyzer.hfstol')hfst.feed('niska')# returns: # (('niska', '+N', '+A', '+Sg'), ('niska', '+N', '+A', '+Obv'))hfst.feed_in_bulk(['niska','kinipânânaw'])# returns: # {'niska': {('niska', '+N', '+A', '+Obv'), ('niska', '+N', '+A', '+Sg')}, 'kinipânânaw': {('nipâw', '+V', '+AI', '+Ind', '+Prs', '+12Pl')}}hfst.feed_in_bulk_fast(['niska','kinipânânaw'])# returns:# {'niska': {'niska+N+A+Obv', 'niska+N+A+Sg'}, 'kinipânânaw': {'nipâw+V+AI+Ind+Prs+12Pl'}}

例如crk-normative-generator.hfstol

fromhfstolimportHFSTOLhfst=HFSTOL.from_file('crk-normative-generator.hfstol')hfst.feed('niska+N+A+Pl')# returns: # (('niskak',),)hfst.feed_in_bulk(["niska+N+A+Pl",'nipâw+V+AI+Ind+Prs+12Pl'])# returns: # {'niska+N+A+Pl': {('niskak',)}, 'nipâw+V+AI+Ind+Prs+12Pl': {('kinipânânaw',), ('kinipânaw',)}}hfst.feed_in_bulk_fast(["niska+N+A+Pl",'nipâw+V+AI+Ind+Prs+12Pl'],multi_process=4)# returns:# {'niska+N+A+Pl': {'niskak'}, 'nipâw+V+AI+Ind+Prs+12Pl': {'kinipânânaw', 'kinipânaw'}}

要查看包含边缘情况的全面a p i行为,请参见this test file(如果i feed('absolute garbage'))如何

API签名

# HFSTOL.from_file@classmethoddeffrom_file(cls,filename:Union[str,pathlib.Path]):"""    :param filename: the `.hfstol` file    :return: an `HFSTOL` instance, which you can use to convert surface forms to deep forms    """pass# HFSTOL.feeddeffeed(self,surface_form:str,concat:bool=True)->Tuple[Tuple[str,...],...]:"""    feed surface form to hfst    :param surface_form: the surface form    :param concat: whether to concatenate single characters        example output for `surface_form` = 'niskak', with `crk-descriptive-analyzer.hfstol`        - True: (('niska', '+N', '+A', '+Pl'), ('nîskâw', '+V', '+II', '+II', '+Cnj', '+Prs', '+3Sg'))        - False: (('n', 'i', 's', 'k', 'a', '+N', '+A', '+Pl'), ('n', 'î', 's', 'k', 'â', 'w', '+V', '+II', '+II', '+Cnj', '+Prs', '+3Sg'))        example output for `surface_form` = 'niska+N+A+Pl' with `crk-normative-generator.hfstol`        - True: (('niskak',),)        - False: (('n', 'i', 's', 'k', 'a', 'k'),)        example output for `surface_form` = 'niska+N+A+Pl' with `crk-normative-generator.hfstol` (an inflection that has two spellings)        - True: (('kinipânaw',), ('kinipânânaw',))        -False: (('k', 'i', 'n', 'i', 'p', 'â', 'n', 'a', 'w'), ('k', 'i', 'n', 'i', 'p', 'â', 'n', 'â', 'n', 'a', 'w'))    """pass# HFSTOL.feed_in_bulk   deffeed_in_bulk(self,surface_forms:List[str],concat=True)->Dict[str,Set[Tuple[str,...]]]:"""    feed a multiple of surface forms to hfst at once    :param surface_forms:    :return: a dictionary with keys being each surface form fed in, values being their corresponding deep forms    """pass# HFSTOL.feed_in_bulk_fastdeffeed_in_bulk_fast(self,strings:Iterable[str],multi_process:int=1)->Dict[str,Set[str]]:"""    calls `hfstol-optimized-lookup`. Evaluation is magnitudes faster. Note the generated symbols will all be all concatenated.    e.g. instead of ['n', 'i', 's', 'k', 'a', '+N', '+A', '+Pl'] it returns ['niska+N+A+Pl']    :keyword multi_process: Defaults to 1. Specify how many parallel processes you want to speed up computation. A rule is to have processes at most your machine core count.    """

使用feed_in_bulk_fast

feed_in_bulk_fast调用编译的c代码,其速度可能比feed_in_bulk快100倍。

它需要安装hfst-optimized-lookup。版本1.2经过测试可以工作。对于linux系统,安装可以像sudo apt install hfst一样简单。对于其他系统,请参见installation guide

如果找不到hfst-optimized-lookup,则调用feed_in_bulk_fast抛出ImportError

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java JAXB封送字符串,具有xml值,且不转义该值   java ModelMapper转换器不工作   java像HH000412或HCANN000001这样的前缀是什么意思?   验证日期输入修复java。lang.numberformatexception错误   当表具有外键时,java Telosys代码生成失败   如何使所有派生类一起只能实例化一个实例的单例抽象基类?(爪哇)   java如何在非静态服务类中使用广播接收器   java nutch爬虫相对URL问题   使用Jboss DMR下载/保存java附件   Rest模板:无法提取响应:当我们得到xml响应时,没有找到适合响应类型的HttpMessageConverter,没有绑定到JAVA对象   java如何编写可扩展窗格/面板/卡的代码   java是在ITreeViewerListener的treeExpanded()之前调用ContentProvider的getChildren()吗?   java将JComponent添加到小程序窗格   java混淆:使用简单逻辑的Flames程序