问题:将文本从第一人称转换为第二人称,同时忽略引号中的文本“”

2024-06-12 11:41:15 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试将故事/句子/单词/等从第一人称语法转换为第二人称语法,但尝试不转换引号“”或“”中的文本

这是在GoogleColab、Python3笔记本中运行的。代码在我的googledrive中读取一个文件,读取.txt文件,通过“forms=”将文件中的单词从第一人称转换为第二人称。还有一个问题是,在转换发生后,在引号前后插入空格(“and”受影响)


import nltk
from google.colab import drive

drive.mount('/content/drive')

sent = open('/content/drive/My Drive/storyuno.txt', 'r') 


forms = {"am" : "are", "are" : "am", 'i' : 'you', 'my' : 'yours', 'me' : 'you', 'mine' : 'yours', 'you' : 'I', 'your' : 'my', 'yours' : 'mine'} # More?
def translate(word):
  if word.lower() in forms: return forms[word.lower()]
  return word

translated = []
quote_mode = False
for word in nltk.wordpunct_tokenize(sent.read()):
   if quote_mode:
       translated.append(word)
       if word == '"': quote_mode = False;

   if not quote_mode:
       translated.append(translate(word))
   if word == '"': quote_mode = True;

result = ' '.join(translated)

print(result) 
sent.close()

我输入的故事:

The bottom line is that if I was going to tell anyone about the frog, it would be Soy. I decided that our walk home would be the most opportune time. “Did you see anything outside today during math?” I asked Soy as we started walking. “What do you mean? Like in the sky?” he asked, jumping over cracks in the sidewalk. “I mean right outside the window. Like right up against it,” I answered. “Like a person?” he asked, still hopping. Soy sat in the row farthest from the window, so it was possible, but unlikely, for someone to walk by without him noticing.

它转换为:

The bottom line is that if you was going to tell anyone about the frog , it would be Soy . you decided that our walk home would be the most opportune time . “ Did I see anything outside today during math ?” you asked Soy as we started walking . “ What do I mean ? Like in the sky ?” he asked , jumping over cracks in the sidewalk . “ you mean right outside the window . Like right up against it ,” you answered . “ Like a person ?” he asked , still hopping . Soy sat in the row farthest from the window , so it was possible , but unlikely , for someone to walk by without him noticing .

问题是引号内的文本不应转换。 我告诉她,“你很无聊。”——>;你告诉她,“你很无聊”

忽略任何语法错误以外的报价问题,我会修正它以后


Tags: theinyouifthatmodeitforms
1条回答
网友
1楼 · 发布于 2024-06-12 11:41:15

您对报价有两个问题。第一个是不等于"。第二个是引号可以与相邻的标点符号捆绑在一起,因此可以得到像?”这样的标记。解决方案是使用正则表达式检查令牌中是否存在任何引号:

import re
quote_re = re.compile(r'["“”]')

然后改变

if word == '"':

进入

if quote_re.search(word):

空格的问题可以通过以下方式解决:

from nltk.tokenize.treebank import TreebankWordDetokenizer
detokenizer = TreebankWordDetokenizer()
result = detokenizer.detokenize(translated)

相关问题 更多 >