我有一个函数,返回以下格式的值:
["Stage 1 : Package Description: Blisters are made in a cold-forming process from an aluminium base web. Each tablet isfilled into a separate blister and a lidding foil of aluminium is welded on. The blistersare opened by pressing the tablets through the lidding foil. Values: ['Blister', 'Foil', 'Aluminium']", "Stage 2 : Package Description: The tablets are filled into box cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'] Colour: White", "Stage 3 : Package Description: The tablets are filled into box cylindrically shaped bottles made of white colouredpolyethylene. The volumes of the bottles depend on the tablet strength and amount oftablets, ranging from 20 to 175 ml. The screw type cap is made of white colouredpolypropylene and is equipped with a tamper proof ring. Values: ['Bottle', 'Screw Type Cap', 'Polypropylene'] Colour: White"]
因此,对于某些阶段,“颜色”将出现,而对于某些阶段,“颜色”将不出现。我想将这些值提取到csv中,其中的列应如下所示:
CSV中的预期输出:
StageNumber PackageDescription Values1 Values2 Values3 Colour
1. Blisters are made in a ... Blister Foil Aluminium
2. The tablets are filled ... Bottle Cylindrically shaped Bottles Polyethylene White
迄今为止的代码:
paragraphs = ['The tablets are filled into cylindrically shaped bottles made of white coloured\npolyethylene. The volumes of the bottles depend on the tablet strength and amount of\ntablets, ranging from 20 to 175 ml. The screw type cap is made of white coloured\npolypropylene and is equipped with a tamper proof ring.', 'PVC/PVDC blister pack', 'Blisters are made in a cold-forming process from an aluminium base web. Each tablet is\nfilled into a separate blister and a lidding foil of aluminium is welded on. The blisters\nare opened by pressing the tablets through the lidding foil.', '\n']
final_ref = [['Blister', 'Foil', 'Aluminium'], ['Blister', 'Base Web', 'PVC/PVDC'], ['Bottle', 'Cylindrically shaped Bottles', 'Polyethylene'], ['Bottle', 'Screw Type Cap', 'Polypropylene'], ['Blister', 'Base Web', 'PVC'], ['Blister', 'Base Web', 'PVD/PVDC'], ['Bottle', 'Square Shaped Bottle', 'Polyethylene']]
colours = ['White', 'Yellow', 'Blue', 'Red', 'Green', 'Black', 'Brown', 'Silver', 'Purple', 'Navy blue', 'Gray', 'Orange', 'Maroon', 'pink', 'colourless', 'blue']
TEXT_WITHOUT_COLOUR = 'Stage {counter} : Package Description: {sen} Values: {values}'
TEXT_WITH_COLOUR = TEXT_WITHOUT_COLOUR + ' Colour: {colour}'
counter = 1
result = []
def is_missing(words, sen):
for w in words:
if w.lower() not in sen.lower():
return True
return False
for words in final_ref:
for sen in paragraphs:
if is_missing(words, sen):
continue
kwargs = {
'counter': counter,
'sen': sen,
'values': str(words)
}
if words[0] == 'Bottle':
for wd in colours:
if wd.lower() in sen.lower():
kwargs['colour'] = wd
break
text_const = TEXT_WITH_COLOUR
else:
text_const = TEXT_WITHOUT_COLOUR
result.append(text_const.format(**kwargs).replace('\n', '').replace('\t', ''))
counter += 1
print(result)
我提出的解决方案并不是最正确的,不管它如何有效。你可以试着改进它,使之适应你的需要
我希望我能帮上忙
相关问题 更多 >
编程相关推荐