python字典迭代未按预期工作

2024-04-29 22:48:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图用转录本数据修改json文件,以便将每个对话片段组合成一个句子

这是指向我的输入数据的链接: https://jsoneditoronline.org/#left=cloud.34cfd15f2c1f461f9e7a7ab57431de79

输出数据: https://jsoneditoronline.org/#left=cloud.99a89b483ae84c7f8913da5ecfd3f4a3

我的目标是让输入数组中的每个项目组合所有对话项目,检查下一个项目是否来自新的演讲者。如果是,我将重置字符串,否则我将添加新项,直到说话人更改或字符串增加到500个字符。由于某些原因,正如您在输出中看到的,我的数据不断重复

这是我的代码:

import json



with open('input-data.json', 'r') as f:
    text = json.load(f)
    
segment_string = ''
current_speaker = ''
sentimentData_full = {}
sentimentData_final = []
for item in text: 
    conversation_segment_list = item['conversation_items']
    speaker = item['speaker_label']
    for segment in conversation_segment_list:
        if len(segment_string) >= 500 or speaker != current_speaker:     
            segment_string = ''
            segment_string += f"{segment['content']} "
            current_speaker = speaker
        else:
            segment_string += f"{segment['content']} "
            continue
    sentimentData = {}
    sentimentData_full['speaker_label'] = speaker
    sentimentData_full['segment_string'] = segment_string                
    sentimentData_final.append(sentimentData_full.copy())
    sentimentData_full = {}       

app_json = json.dumps(sentimentData_final)
with open('output-data.json', 'w') as f:
    f.write(app_json)

我已经为此工作了好几个小时,如果有任何帮助,我将不胜感激。 还有一个例子说明了现在的问题(以防我的解释不够清楚):

电流,输出不正确:

[
  {
    "speaker_label": "spk_0",
    "segment_string": "mhm . "
  },
  {
    "speaker_label": "spk_0",
    "segment_string": "mhm . You have reached a as in so far . This is Donna . I'll be assisting you with your inquiries today . Please be informed that this call is being recorded and monitored for quality assurance purposes . How may I help you ? "
  },
  {
    "speaker_label": "spk_1",
    "segment_string": "Um , well , I bought . All right , I got this , um , essence of argon oil , um , "
  },
  {
    "speaker_label": "spk_1",
    "segment_string": "Um , well , I bought . All right , I got this , um , essence of argon oil , um , for shipping , handling and handling costs . 599 a sample of it . "
  },
  {
    "speaker_label": "spk_1",
    "segment_string": "Um , well , I bought . All right , I got this , um , essence of argon oil , um , for shipping , handling and handling costs . 599 a sample of it . And , um , if I want to cancel the order , I had to do it within , "
  } 
]

预期产出:

{
    "speaker_label": "spk_0",
    "segment_string": "mhm . You have reached a as in so far . This is Donna . I'll be assisting you with your inquiries today . Please be informed that this call is being recorded and monitored for quality assurance purposes . How may I help you ? "
  },
  {
    "speaker_label": "spk_1",
    "segment_string": "Um , well , I bought . All right , I got this , um , essence of argon oil , um , for shipping , handling and handling costs . 599 a sample of it . And , um , if I want to cancel the order , I had to do it within , "
  } 
]


Tags: andofjsonforstringsegmentitspk
1条回答
网友
1楼 · 发布于 2024-04-29 22:48:08

我无法调试您的错误,但我从头开始创建了一个。钥匙可以根据您的要求进行调整,但目前我使用扬声器作为钥匙

import json
import re
from collections import defaultdict

speakers = defaultdict(list)

with open("try.json", "r") as f:
    text = json.load(f)

for item in text:
    conversation_segment_list = item["conversation_items"]
    speaker = item["speaker_label"]
    for segment in conversation_segment_list:
        speakers[speaker].append(segment["content"])

speakers = {
    speaker: re.sub(r"[\s]([.,?])", r"\1", " ".join(words))[:500]
    for speaker, words in speakers.items()
}

print(speakers)

输出:

{
    "spk_0": "mhm. You have reached a as in so far. This is Donna. I'll be assisting you with your inquiries today. Please be informed that this call is being recorded and monitored for quality assurance purposes. How may I help you? Okay, I didn't have I'm happy to assist you. Um, for me to be able to pull up your, uh, subscription here, could you kind of provide me your first and your last name? Oh, l y and then Yes, l a k e. Yes. Okay. Just, uh, go ahead to pull up here. A subscription here or your account",
    "spk_1": "Um, well, I bought. All right, I got this, um, essence of argon oil, um, for shipping, handling and handling costs. 599 a sample of it. And, um, if I want to cancel the order, I had to do it within, uh, 15 days. And so that is, um when I want I wanted to do, I didn't want to. I didn't want to get, you know, like a monthly for what is it at $3 a month? Okay, I can't afford that. I'm Carolyn. C a R O L Y N lake l a k e It's, uh, Lake 3921 at hotmail dot com. Uh, 3 10 Warren Avenue number 2 July Wy",
}

相关问题 更多 >