修复字符串中错误的分隔符

1条回答

网友

1楼 · 发布于 2024-05-13 18:41:17

您可以尝试检查单词拼写，例如使用pyspellchecker

（pip安装pyspellchecker）

from spellchecker import SpellChecker
spell = SpellChecker()

s="rate implies depreciation. Th  e straight lines show eff ective linear time trends in the nominal (dashed "
splitted_s = s.split(' ')
splitted_s = list(filter(None, splitted_s)) #remove empty element in between two consecutive space

然后检查一个单词是否不存在，但前一个单词+单词是否存在：

    valid_s = [splitted_s[0]]
    for i in range(1,len(splitted_s)):
      word = splitted_s[i]
      previous_word = splitted_s[i-1]
      valid_s.append(word)
      if spell.unknown([word]) and len(word)>0:
        if not spell.unknown([(previous_word+word).lower()]):
          valid_s.pop()
          valid_s.pop()
          valid_s.append(previous_word+word)

    print(' '.join(valid_s))

 >>>rate implies depreciation. Th e straight lines show effective linear time trends in the nominal (dashed

但在这里，因为e在字典中作为一个词存在，所以它不连接th和e

所以，如果上一个单词+单词在字典中的使用频率（远）高于单词，您还可以比较单词频率，并将上一个单词与单词连接起来：

    valid_s = [splitted_s[0]]
    for i in range(1,len(splitted_s)):
      word = splitted_s[i]
      previous_word = splitted_s[i-1]
      valid_s.append(splitted_s[i])
      if spell.word_probability(word.lower())<spell.word_probability((previous_word+word).lower()):
        valid_s.pop()
        valid_s.pop()
        valid_s.append(previous_word+word)


    print(' '.join(valid_s))

 >>>rate implies depreciation. The straight lines show effective linear time trends in the nominal (dashed

相关问题更多 >

编程相关推荐

热门问题

热门文章

修复字符串中错误的分隔符

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >