我有以下清单:
data= [
[[0.025],
['-DOCSTART-'],
['O']],
[[0.166, 0.001, 4.354, 4.366, 7.668],
['Summary', 'of', 'Consolidated', 'Financial', 'Data'],
['O', 'O', 'B-ORG', 'I-ORG', 'E-ORG']],
[[0.195, 0.1, 0.0, 3.561, 3.793, 6.741, 4.0, 0.05],
['Port', 'conditions', 'from', 'Lloyds', 'Shipping', 'Intelligence', 'Service', '--'],
['S-PER', 'O', 'O', 'B-ORG', 'I-ORG', 'I-ORG', 'E-ORG', 'O']]
]
注意:data[i]
内的每个列表都有相同的长度,i in [0, 1, 2]
我想创建一个JSON文件,如下所示:
[{
"sentence": "-DOCSTART- Summary of Consolidated Financial Data Port conditions from Lloyds Shipping Intelligence Service --",
"annotations": [
{
"decision": "Consolidated Financial Data",
"category": "ORG",
"token_loss": [4.354, 4.366, 7.668],
"totalloss": 4.354+4.366+7.668 # Here, I consider the sum of "token_loss"
},
{
"decision": "Port",
"category": "PER",
"token_loss": 18.44,
"totalloss": 18.44
},
{
"decision": "Lloyds Shipping Intelligence Service",
"category": "ORG",
"token_loss": [3.561, 3.793, 6.741, 4.0],
"totalloss": 3.561+3.793+6.741+4.0
}]
}]
在列表中,始终有“B-”(开始)、“I-”(内部)和“E-”(结束)的顺序。总是有一个带“S-”(single)的单词。我不考虑“O”字(外)。p>
这就是我开始尝试解决这个问题的原因
startIdx = 0
endIdx = 10
decisions = []
for tag in tags:
if tag.startswith('B'):
start = tags.index(tag)
startIdx = start
while startIdx<10:
if tags[startIdx+1].startswith('I'):
decisions.append(tokens[startIdx:startIdx+1])
startIdx += 1
if tags[startIdx+1].startswith('E'):
decisions.append(tokens[startIdx:startIdx+1])
startIdx = 11
您可以使用生成器函数生成分组:
输出:
在新样本上运行时:
输出:
相关问题 更多 >
编程相关推荐