Python神话有那么难吗?

2022-05-21 08:30:53 发布

您现在位置:Python中文网/ 问答频道 /正文

我已经编写了一个程序来实现shm-reduplication

规则基本上是,如果一个单词以一个辅音(或一组辅音)开头,那么你去掉它并添加“shm”,但如果它以一个元音开头,那么你只添加“shm”。你还把整个事情放在现有单词的末尾

问题是字母Y,因为有时是辅音,有时是元音。我想you变成you-shmou,但我想Python变成Python-Shmython。我该怎么办

这是到目前为止我的代码

import re

def word_shmord(word):
    orig = word
    if word.isupper():
        prefix = "SHM"
    elif word.istitle():
        word = word.lower()
        prefix = "Shm"
    else:
        prefix = "shm"
    position = re.search("[aeiou]", word, re.IGNORECASE).start()
    new = prefix + word[position:]
    return "{}-{}".format(orig, new)


text = """
All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.
"""
text_shmext = re.sub("\w+", lambda m:word_shmord(m.group(0)), text)
print(text_shmext)

Tags: andtextreyounewprefixposition单词arewordshm元音orig辅音shmord
1条回答
网友
1楼 ·

我觉得这个问题很有趣,所以我为这个问题编写了一些语言规则(或者我应该说shmoblem

import re
import string
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.tokenize.sonority_sequencing import SyllableTokenizer

stop = stopwords.words('english')
tk = SyllableTokenizer()


def word_shmord(word):
    if (len(word) < 4 and word.lower() in stop) or not word.isalnum() or word.lower().startswith('shm'):
        return word
    if 'y' in word:
        y = word.find('y')
        # Y is considered to be a vowel if The word has no other vowel
        if len(re.findall("[aeiou]", word, re.IGNORECASE)) == 0 and word.count('y') == 1:
            word = word[:y] + '#' + word[y + 1:]
        # or if the letter is at the end of a word
        if word[-1] == 'y':
            word = word[:-1]+ '#'
        # or middle/end of syllable
        if word.find('y') != -1:
            syll = tk.tokenize(word)
            for i, s in enumerate(syll):
                snew = s[:-1] + '#' if s[-1] == 'y' else s
                y = snew.find('y')
                if len(snew) // 2 == y:
                    snew = snew[:y] + '#' + snew[y + 1:]
                syll[i] = snew
            word = ''.join(syll)

    if word.isupper():
        prefix = "SHM"
    elif word.istitle():
        word = word.lower()
        prefix = "Shm"
    else:
        prefix = "shm"
    vowels = re.search("[aeiou#]", word, re.IGNORECASE)
    if not vowels:
        return word
    position = vowels.start()
    new = prefix + word[position:].replace('#', 'y')
    return new


text = "The quick brown fox jumps over the lazy dog"
text_shmext = ([word_shmord(x) for x in word_tokenize(text)])
# join strings
text_shmext = "".join([" " + i if i not in string.punctuation else i for i in text_shmext]).strip()
print(text_shmext)

输入:敏捷的棕色狐狸跳过懒惰的狗

输出:shmuick shmown shmox shmumps shmover shmazy shmog