我想通过我知道的唯一方法来绘制数据集的集群,就是将字符串映射到某个整数值。像
data_mapped=data.copy()
data_mapped['Language']=data_mapped['Language'].map({'English':0,'French':1,'German':2})
data_mapped
但在这个例子中,我只有3个唯一的语言值可以通过这个方法映射
现在我不知道如何将多个唯一的字符串值转换为整数值并绘制集群? 我想通过一些列(颜色、面料、服装类型)进行聚类 我想对整个数据进行聚类
现在我的数据集是:
file_list=glob.glob('json_file/[!Merg_all]*json')
merg_all_list=[]
for file in file_list:
print(file)
raw_data=pd.read_json(str(file))
raw_data.head()
for i in raw_data['product']:
merg_all_list.append(i)
json文件:
[{"product": {"brand_name": "So Kamal", "designer": "So Kamal", "title": "So Kamal Women Summer Collection Mustard Lawn 1PC -Unstitched Shirt DPL19 49 LA00964-Std-MST", "description": "description specifications of so kamal women summer collection mustard lawn 1pc unstitched shirt dpl19 49 la00964 std mst brand so kamal sku 105972128_pk 1253666066 features 1pc unstitched main material lawn season summer material family lawn what's in the box 1x 1pc unstitched suit", "dress_type": "shirt", "where_to_wear": "", "color": "mustard", "stitched": false, "season": "summer", "price": 1120, "currency": "Rs", "product_id": "So Kamal Women Summer Collection Mustard Lawn 1PC -Unstitched Shirt DPL19 49 LA00964-Std-MST", "collection_url": "https://lawncollection.pk/brands/", "source": "https://lawncollection.pk/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst.html", "fabric": "lawn", "gender": "women", "frontpic": "https://lawncollection.pk/public/images/products//2019/04/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst-image1.jpeg", "backpic": "https://lawncollection.pk/public/images/products//2019/04/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst-image2.jpeg", "otherpics": ["https://lawncollection.pk/public/images/products//2019/04/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst-image1.jpeg", "https://lawncollection.pk/public/images/products//2019/04/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst-image2.jpeg"], "sku": "SKU: 105972128_PK-1253666066", "details": "https://lawncollection.pk/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst.html https: lawncollection.pk so kamal women summer collection mustard lawn 1pc unstitched shirt dpl19 49 la00964 std mst.html so kamal so kamal women summer collection mustard lawn 1pc -unstitched shirt dpl19 49 la00964-std-mst description specifications of so kamal women summer collection mustard lawn 1pc unstitched shirt dpl19 49 la00964 std mst brand so kamal sku 105972128_pk 1253666066 features 1pc unstitched main material lawn season summer material family lawn what's in the box 1x 1pc unstitched suit", "Category1_list": "unstitched", "size": {"xs": false, "s": false, "m": false, "xl": false, "xxl": false}}}]
数据帧
brand_name designer title description dress_type where_to_wear color stitched season price ... source fabric gender frontpic backpic otherpics details Category1_list size sku
0 Polo Ralph Lauren Polo Ralph Lauren Long Sleeve Knit Magic Fleece Sweatshirt - Casual graphic print sweatshirt- Crew neckli... sweatshirt black True 8544 ... https://www.zalora.com.ph/polo-ralph-lauren-lo... cotton man static.ph.zalora.net/p/polo-ralph-lauren-3175-... static.ph.zalora.net/p/polo-ralph-lauren-3175-... [static.ph.zalora.net/p/polo-ralph-lauren-3175... https://www.zalora.com.ph/polo-ralph-lauren-lo... {'xs': False, 's': False, 'm': False, 'xl': Fa... NaN
1 Polo Ralph Lauren Polo Ralph Lauren Basic Mesh Polo Shirt - Colour block polo shirt with brand print- Un... shirt red True 9265 ... https://www.zalora.com.ph/polo-ralph-lauren-ba... cotton man static.ph.zalora.net/p/polo-ralph-lauren-7554-... static.ph.zalora.net/p/polo-ralph-lauren-7555-... [static.ph.zalora.net/p/polo-ralph-lauren-7554... https://www.zalora.com.ph/polo-ralph-lauren-ba... {'xs': False, 's': False, 'm': False, 'xl': Fa... NaN
2 MANGO Man MANGO Man Faux Shearling Denim Jacket - Denim jacket with wash detail- Collar neckli... jacket blue True 4995 ... https://www.zalora.com.ph/mango-man-faux-shear... denim man static.ph.zalora.net/p/mango-man-9782-7201341-... static.ph.zalora.net/p/mango-man-9783-7201341-... [static.ph.zalora.net/p/mango-man-9782-7201341... https://www.zalora.com.ph/mango-man-faux-shear... {'xs': False, 's': False, 'm': False, 'xl': Fa... NaN
3 Polo Ralph Lauren Polo Ralph Lauren Knit Magic Fleece Hoodie - Embroidered front hoodie- Unlined- Hooded ne... True 10598 ... https://www.zalora.com.ph/polo-ralph-lauren-kn... cotton man static.ph.zalora.net/p/polo-ralph-lauren-2320-... static.ph.zalora.net/p/polo-ralph-lauren-2320-... [static.ph.zalora.net/p/polo-ralph-lauren-2320... https://www.zalora.com.ph/polo-ralph-lauren-kn... {'xs': False, 's': True, 'm': True, 'xl': True... NaN
4 MANGO Man MANGO Man Turtleneck Flecked Sweater - Solid hue speckle-knit sweatshirt- High neck... sweatshirt brown True 2995 ... https://www.zalora.com.ph/mango-man-turtleneck... cotton man static.ph.zalora.net/p/mango-man-1900-5990341-... static.ph.zalora.net/p/mango-man-1900-5990341-... [static.ph.zalora.net/p/mango-man-1900-5990341... https://www.zalora.com.ph/mango-man-turtleneck... {'xs': False, 's': False, 'm': False, 'xl': Fa... NaN
为数据选择适当的可视化技术
在分类数据上,条形图比散点图更合适,因为您不需要x轴为数字
还要选择合适的算法。。。K-均值仅对连续变量有意义。将类别编码为k-means的整数是错误的。在你的例子中,k-means会假设英语和德语的平均值正好是法语
他们从我的卡西姆教授那里得到了答案,我认为这将帮助人们
这是将每个唯一值转换为某个整数的方法
相关问题 更多 >
编程相关推荐