如何处理数据帧中仅因大小写而延迟的值?

2024-04-27 07:45:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我合并到类似的数据集,结果合并了很多我的值,它们具有相同的值,但情况不同。我正试图找到一种方法,以最具python风格的方式修复这些值

例如,我有一个武装列,它的唯一VAL是:

array([nan, 'Gun', 'Knife', 'Unarmed', 'Toy weapon', 'gun', 'unarmed',
       'toy weapon', 'nail gun', 'knife', 'shovel', 'hammer', 'hatchet',
       'undetermined', 'sword', 'machete', 'box cutter', 'metal object',
       'screwdriver', 'lawn mower blade', 'flagpole',
       'guns and explosives', 'cordless drill', 'crossbow', 'metal pole',
       'Taser', 'metal pipe', 'metal hand tool', 'blunt object',
       'metal stick', 'sharp object', 'meat cleaver', 'carjack',
       "contractor's level", 'chain', 'unknown weapon', 'stapler',
       'beer bottle', 'bean-bag gun', 'baseball bat and fireplace poker',
       'straight edge razor', 'gun and knife', 'ax', 'brick',
       'baseball bat', 'hand torch', 'chain saw', 'garden tool',
       'scissors', 'pole', 'pick-axe', 'flashlight', 'baton', 'spear',
       'chair', 'pitchfork', 'hatchet and gun', 'rock', 'piece of wood',
       'bayonet', 'pipe', 'glass shard', 'motorcycle', 'pepper spray',
       'metal rake', 'crowbar', 'oar', 'machete and gun', 'tire iron',
       'air conditioner', 'pole and knife', 'baseball bat and bottle',
       'fireworks', 'pen', 'chainsaw', 'gun and sword', 'gun and car',
       'vehicle', 'pellet gun', 'claimed to be armed', 'BB gun',
       'incendiary device', 'samurai sword', 'bow and arrow',
       'gun and vehicle', 'vehicle and gun', 'wrench', 'walking stick',
       'barstool', 'BB gun and vehicle', 'wasp spray', 'air pistol',
       'Airsoft pistol', 'baseball bat and knife', 'vehicle and machete'],
      dtype=object) 

正如你所看到的,它们几乎都是复制品,但一个是大写的,另一个都是小写的


Tags: andobjecthandmetalpipeweaponvehiclebat
1条回答
网友
1楼 · 发布于 2024-04-27 07:45:12

如果“fix”是指删除重复项,我建议在合并之前将两个数组都转换为小写

以下是一些例子:

>>> [x.lower() for x in ["A","B","C"]]
['a', 'b', 'c']

>>> df['x_lowercase'] = df['x'].str.lower()

相关问题 更多 >