python符号拼写

symspellp的Python项目详细描述


符号拼写
Build StatusDocumentation Statuscodecov

symspellpy是SymSpellv6.3的一个python端口,它提供了更高的速度和更低的内存消耗。单元测试 从原来的项目都是为了保证港口的准确性而实施的。

请注意,端口尚未针对速度进行优化。

用法

安装symspellpy模块

pip install -U symspellpy

将频率字典复制到项目

复制frequency_dictionary_en_82_765.txt(位于内部symspellpy 目录)到您的项目目录,这样您最终会得到以下布局:

project_dir
  +-frequency_dictionary_en_82_765.txt
  \-project.py

添加新术语

  • 使用load_dictionary(corpus=<path/to/dictionary.txt>, <term_index>,<count_index>)dictionary.txt应该包含:
<term> <count>
<term> <count>
...
<term> <count>

其中,term_index表示术语的列数,count_index表示计数/频率的列数。

  • <term> <count>附加到提供的frequency_dictionary_en_82_765.txt
  • 使用方法create_dictionary_entry(key=<term>, count=<count>)

示例用法(create_dictionary

importosfromsymspellpy.symspellpyimportSymSpell# import the moduledefmain():# maximum edit distance per dictionary precalculationmax_edit_distance_dictionary=2prefix_length=7# create objectsym_spell=SymSpell(max_edit_distance_dictionary,prefix_length)# create dictionary using corpus.txtifnotsym_spell.create_dictionary(<path/to/corpus.txt>):print("Corpus file not found")returnforkey,countinsym_spell.words.items():print("{}{}".format(key,count))if__name__=="__main__":main()

corpus.txt应该包含:

abc abc-def abc_def abc'def abc qwe qwe1 1qwe q1we 1234 1234

预期输出:

abc 4
def 2
abc'def 1
qwe 1
qwe1 1
1qwe 1
q1we 1
1234 2

示例用法(lookuplookup_compound

使用project.py(代码比允许解释方法参数所需的代码更详细)

importosfromsymspellpy.symspellpyimportSymSpell,Verbosity# import the moduledefmain():# maximum edit distance per dictionary precalculationmax_edit_distance_dictionary=2prefix_length=7# create objectsym_spell=SymSpell(max_edit_distance_dictionary,prefix_length)# load dictionarydictionary_path=os.path.join(os.path.dirname(__file__),"frequency_dictionary_en_82_765.txt")term_index=0# column of the term in the dictionary text filecount_index=1# column of the term frequency in the dictionary text fileifnotsym_spell.load_dictionary(dictionary_path,term_index,count_index):print("Dictionary file not found")return# lookup suggestions for single-word input stringsinput_term="memebers"# misspelling of "members"# max edit distance per lookup# (max_edit_distance_lookup <= max_edit_distance_dictionary)max_edit_distance_lookup=2suggestion_verbosity=Verbosity.CLOSEST# TOP, CLOSEST, ALLsuggestions=sym_spell.lookup(input_term,suggestion_verbosity,max_edit_distance_lookup)# display suggestion term, term frequency, and edit distanceforsuggestioninsuggestions:print("{}, {}, {}".format(suggestion.term,suggestion.distance,suggestion.count))# lookup suggestions for multi-word input strings (supports compound# splitting & merging)input_term=("whereis th elove hehad dated forImuch of thepast who ""couqdn'tread in sixtgrade and ins pired him")# max edit distance per lookup (per single word, not per whole input string)max_edit_distance_lookup=2suggestions=sym_spell.lookup_compound(input_term,max_edit_distance_lookup)# display suggestion term, edit distance, and term frequencyforsuggestioninsuggestions:print("{}, {}, {}".format(suggestion.term,suggestion.distance,suggestion.count))if__name__=="__main__":main()
预期产量:

members, 1, 226656153

where is the love he had dated for much of the past who couldn't read in six grade and inspired him, 9, 300000

示例用法(word_segmentation

使用project.py(代码比允许解释 方法参数)

importosfromsymspellpy.symspellpyimportSymSpell# import the moduledefmain():# maximum edit distance per dictionary precalculationmax_edit_distance_dictionary=0prefix_length=7# create objectsym_spell=SymSpell(max_edit_distance_dictionary,prefix_length)# load dictionarydictionary_path=os.path.join(os.path.dirname(__file__),"frequency_dictionary_en_82_765.txt")term_index=0# column of the term in the dictionary text filecount_index=1# column of the term frequency in the dictionary text fileifnotsym_spell.load_dictionary(dictionary_path,term_index,count_index):print("Dictionary file not found")return# a sentence without any spacesinput_term="thequickbrownfoxjumpsoverthelazydog"result=sym_spell.word_segmentation(input_term)# display suggestion term, term frequency, and edit distanceprint("{}, {}, {}".format(result.corrected_string,result.distance_sum,result.log_prob_sum))if__name__=="__main__":main()
预期产量:

the quick brown fox jumps over the lazy dog 8 -34.491167981910635

输送套管

从原来的短语转换大小写 要更正输入错误,请使用的transfer_casing布尔标志 lookup()lookup_compound()方法:

lookup_compound()

suggestions = sym_spell.lookup_compound(input_term,
                                        max_edit_distance_lookup,
                                        transfer_casing=True)

lookup()

suggestions = sym_spell.lookup(input_term,
                               suggestion_verbosity,
                               max_edit_distance_lookup,
                               transfer_casing=True)

变更日志

6.3.9(2019-08-06)


  • transfer_casing添加到lookuplookup_compound
  • 固定前缀长度签入_edits_prefix

6.3.8(2019-03-21)


  • 实现delete_dictionary_entry
  • 通过使用python内置哈希来提高性能
  • 添加了pickle的版本控制

6.3.7(2019-02-18)


  • lookup
  • 中修复了include_unknown
  • 删除了未使用的initial_capacity参数
  • 提高了_get_str_hash性能
  • 实现了save_pickleload_pickle,以避免创建 每次都查字典

6.3.6(2019-02-11)


  • 添加了create_dictionary()功能

6.3.5(2019-01-14)


  • 修复了lookup_compound()以返回正确的distance

6.3.4(2019-01-04)


  • 添加<self._replaced_words = dict()>以跟踪拼写错误的单词数
  • ignore_token添加到word_segmentation()以忽略正则表达式的单词

6.3.3(2018-12-05)


  • 添加了word_segmentation()功能

6.3.2(2018-10-23)


  • encoding选项添加到load_dictionary()

6.3.1(2018-08-30)


  • symspellpy
  • 创建包

6.3.0(2018-08-13)


欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java如何在IntelliJ社区版中为maven项目创建war文件?   架构在java编程中创建模型数据的最佳方式   java代码可以打印当前数字右边的最大数字,最后一个数字应该打印1   java上一个和下一个按钮,用于在WebView中从数组中加载字符串   java与直接DB调用/RESTful服务调用相比,测试Hazelcast数据检索速度的最佳方法是什么?   资源/类链接上的Java404   java如何安装play2War插件(Play Framework 2.1.1.)   多线程守护进程线程行为java   java如何从RepainManager生成异常   java Hibernate集合映射问题。无法删除或更新父行:外键约束失败   java LibGDX TextureRegion NullPointerException   java无法在JUnit套件中添加测试类   java通配符捕获/泛型   awt如何在Java中获取当前的鼠标指针类型?   java将probuf转换为POJO   java bouncycastle是否支持RSA PKCS1OAEP填充?   SQLiteDatabase的java问题。SQLITE数据库时的查询()。rawQuery()工作正常   java Android。如何正确存储数据库的数据?   java如何访问与GAE默认服务帐户关联的电子邮件地址?