Python：将SSML与SAPI（comtypes）结合使用

长格式描述

我在windows10上使用python3.7.2，并尝试发送create一个XML（SSML:https://www.w3.org/TR/speech-synthesis/）文件来与微软的语音API一起使用。当我看到SSML格式时，它支持一个音素标记，允许你指定一个给定单词的发音。微软实现了部分标准（https://docs.microsoft.com/en-us/cortana/skills/speech-synthesis-markup-language#phoneme-element），所以我找到了一个包含IPA发音的UTF-8编码库。当我尝试调用SAPI时，替换了部分代码，出现以下错误：

Traceback (most recent call last): File "pdf_to_speech.py", line 132, in <module> audioConverter(text = "Hello world extended test",outputFile = output_file) File "pdf_to_speech.py", line 88, in __call__ self.engine.speak(text) _ctypes.COMError: (-2147200902, None, ("'ph' attribute in 'phoneme' element is not valid.", None, None, 0, None))

我一直在试着调试，但是当我打印出单词的发音时，字符就是方框。但是，如果我从控制台复制并粘贴它们，它们看起来很好（见下文）。你知道吗

həˈloʊ, ˈwɝːld ɪkˈstɛndəd, ˈtɛst

最好的猜测

我不确定这个问题是不是由 1）我已经修改了pythons的版本，以便能够打印unicode 2）我修复了读取文件的问题 3）我对字符串的操作不正确

我很确定问题是我没有将它作为unicode传递给comtype对象。我正在研究的想法是 1）少了一面旗吗？ 2）当它被传递到comtypes（C类型错误）时，它是否被转换成ascii？ 3） XML是否传递不正确/是否遗漏了一个步骤？你知道吗

偷看代码

这个类读取IPA字典，然后生成XML文件。看看你的音位和发音。你知道吗

class SSML_Generator: def __init__(self,pause,phonemeFile): self.pause = pause if isinstance(phonemeFile,str): print("Loading dictionary") self.phonemeDict = self._load_phonemes(phonemeFile) print(len(self.phonemeDict)) else: self.phonemeDict = {} def _load_phonemes(self, phonemeFile): phonemeDict = {} with io.open(phonemeFile, 'r',encoding='utf-8') as f: for line in f: tok = line.split() #print(len(tok)) phonemeDict[tok[0].lower()] = tok[1].lower() return phonemeDict def __call__(self,text): SSML_document = self._header() for utterance in text: parent_tag = self._pronounce(utterance,SSML_document) #parent_tag.tail = self._pause(parent_tag) SSML_document.append(parent_tag) ET.dump(SSML_document) return SSML_document def _pause(self,parent_tag): return ET.fromstring("<break time=\"150ms\" />") # ET.SubElement(parent_tag,"break",{"time":str(self.pause)+"ms"}) def _header(self): return ET.Element("speak",{"version":"1.0", "xmlns":"http://www.w3.org/2001/10/synthesis", "xml:lang":"en-US"}) # TODO: Add rate https://docs.microsoft.com/en-us/cortana/skills/speech-synthesis-markup-language#prosody-element def _rate(self): pass # TODO: Add pitch def _pitch(self): pass def _pronounce(self,word,parent_tag): if word in self.phonemeDict: sys.stdout.buffer.write(self.phonemeDict[word].encode("utf-8")) return ET.fromstring("<phoneme alphabet=\"ipa\" ph=\"" + self.phonemeDict[word] + "\"> </phoneme>")#ET.SubElement(parent_tag,"phoneme",{"alphabet":"ipa","ph":self.phonemeDict[word]})#<phoneme alphabet="string" ph="string"></phoneme> else: return parent_tag # Nice to have: Transform acronyms into their pronunciation (See say as tag)

我还添加了代码如何写入comtype对象（SAPI），以防出现错误。你知道吗

def __call__(self,text,outputFile): # https://docs.microsoft.com/en-us/previous-versions/windows/desktop/ms723606(v%3Dvs.85) self.stream.Open(outputFile + ".wav", self.SpeechLib.SSFMCreateForWrite) self.engine.AudioOutputStream = self.stream text = self._text_processing(text) text = self.SSML_generator(text) text = ET.tostring(text,encoding='utf8', method='xml').decode('utf-8') self.engine.speak(text) self.stream.Close()

提前感谢您的帮助！你知道吗

1条回答

网友

1楼 · 发布于 2024-06-16 12:19:28

尝试在ph attrubute中使用单引号。像这样

my_text = '<speak><phoneme alphabet="x-sampa" ph=\'v"e.de.ni.e\'>ведение</phoneme></speak>'

还要记住使用\来转义单引号

升级版这个错误也可能意味着你的ph值不能被解析。你可以在那里查看文档：https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/speech-synthesis-markup

这个例子行得通

<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
  <voice  name="en-US-Jessa24kRUS">
    <s>His name is Mike <phoneme alphabet="ups" ph="JH AU"> Zhou </phoneme></s>
  </voice>
</speak>

但事实并非如此

<speak version="1.0" xmlns="https://www.w3.org/2001/10/synthesis" xml:lang="en-US">
  <voice  name="en-US-Jessa24kRUS">
    <s>His name is Mike <phoneme alphabet="ups" ph="JHU AUA"> Zhou </phoneme></s>
  </voice>
</speak>

长格式描述

最好的猜测

偷看代码

相关问题更多 >

编程相关推荐

热门问题

热门文章