我试图用转录本数据修改json文件,以便将每个对话片段组合成一个句子
这是指向我的输入数据的链接: https://jsoneditoronline.org/#left=cloud.34cfd15f2c1f461f9e7a7ab57431de79
输出数据: https://jsoneditoronline.org/#left=cloud.99a89b483ae84c7f8913da5ecfd3f4a3
我的目标是让输入数组中的每个项目组合所有对话项目,检查下一个项目是否来自新的演讲者。如果是,我将重置字符串,否则我将添加新项,直到说话人更改或字符串增加到500个字符。由于某些原因,正如您在输出中看到的,我的数据不断重复
这是我的代码:
import json
with open('input-data.json', 'r') as f:
text = json.load(f)
segment_string = ''
current_speaker = ''
sentimentData_full = {}
sentimentData_final = []
for item in text:
conversation_segment_list = item['conversation_items']
speaker = item['speaker_label']
for segment in conversation_segment_list:
if len(segment_string) >= 500 or speaker != current_speaker:
segment_string = ''
segment_string += f"{segment['content']} "
current_speaker = speaker
else:
segment_string += f"{segment['content']} "
continue
sentimentData = {}
sentimentData_full['speaker_label'] = speaker
sentimentData_full['segment_string'] = segment_string
sentimentData_final.append(sentimentData_full.copy())
sentimentData_full = {}
app_json = json.dumps(sentimentData_final)
with open('output-data.json', 'w') as f:
f.write(app_json)
我已经为此工作了好几个小时,如果有任何帮助,我将不胜感激。 还有一个例子说明了现在的问题(以防我的解释不够清楚):
电流,输出不正确:
[
{
"speaker_label": "spk_0",
"segment_string": "mhm . "
},
{
"speaker_label": "spk_0",
"segment_string": "mhm . You have reached a as in so far . This is Donna . I'll be assisting you with your inquiries today . Please be informed that this call is being recorded and monitored for quality assurance purposes . How may I help you ? "
},
{
"speaker_label": "spk_1",
"segment_string": "Um , well , I bought . All right , I got this , um , essence of argon oil , um , "
},
{
"speaker_label": "spk_1",
"segment_string": "Um , well , I bought . All right , I got this , um , essence of argon oil , um , for shipping , handling and handling costs . 599 a sample of it . "
},
{
"speaker_label": "spk_1",
"segment_string": "Um , well , I bought . All right , I got this , um , essence of argon oil , um , for shipping , handling and handling costs . 599 a sample of it . And , um , if I want to cancel the order , I had to do it within , "
}
]
预期产出:
{
"speaker_label": "spk_0",
"segment_string": "mhm . You have reached a as in so far . This is Donna . I'll be assisting you with your inquiries today . Please be informed that this call is being recorded and monitored for quality assurance purposes . How may I help you ? "
},
{
"speaker_label": "spk_1",
"segment_string": "Um , well , I bought . All right , I got this , um , essence of argon oil , um , for shipping , handling and handling costs . 599 a sample of it . And , um , if I want to cancel the order , I had to do it within , "
}
]
我无法调试您的错误,但我从头开始创建了一个。钥匙可以根据您的要求进行调整,但目前我使用扬声器作为钥匙
输出:
相关问题 更多 >
编程相关推荐