如何将数据框中的字符串值映射到一些数字以绘制聚类?

2024-05-29 05:07:24 发布

您现在位置:Python中文网/ 问答频道 /正文

我想通过我知道的唯一方法来绘制数据集的集群,就是将字符串映射到某个整数值。像

data_mapped=data.copy()
data_mapped['Language']=data_mapped['Language'].map({'English':0,'French':1,'German':2})
data_mapped

但在这个例子中,我只有3个唯一的语言值可以通过这个方法映射

现在我不知道如何将多个唯一的字符串值转换为整数值并绘制集群? 我想通过一些列(颜色、面料、服装类型)进行聚类 我想对整个数据进行聚类

现在我的数据集是:

file_list=glob.glob('json_file/[!Merg_all]*json')
merg_all_list=[]
for file in file_list:
    print(file)
    raw_data=pd.read_json(str(file))
    raw_data.head()
    for i in raw_data['product']:
        merg_all_list.append(i)

json文件:

[{"product": {"brand_name": "So Kamal", "designer": "So Kamal", "title": "So Kamal Women Summer Collection Mustard Lawn 1PC -Unstitched Shirt DPL19 49 LA00964-Std-MST", "description": "description   specifications of so kamal women summer collection mustard lawn 1pc  unstitched shirt dpl19 49 la00964 std mst brand so kamal sku 105972128_pk 1253666066 features 1pc  unstitched main material lawn season summer material family lawn what's in the box 1x 1pc unstitched suit", "dress_type": "shirt", "where_to_wear": "", "color": "mustard", "stitched": false, "season": "summer", "price": 1120, "currency": "Rs", "product_id": "So Kamal Women Summer Collection Mustard Lawn 1PC -Unstitched Shirt DPL19 49 LA00964-Std-MST", "collection_url": "https://lawncollection.pk/brands/", "source": "https://lawncollection.pk/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst.html", "fabric": "lawn", "gender": "women", "frontpic": "https://lawncollection.pk/public/images/products//2019/04/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst-image1.jpeg", "backpic": "https://lawncollection.pk/public/images/products//2019/04/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst-image2.jpeg", "otherpics": ["https://lawncollection.pk/public/images/products//2019/04/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst-image1.jpeg", "https://lawncollection.pk/public/images/products//2019/04/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst-image2.jpeg"], "sku": "SKU: 105972128_PK-1253666066", "details": "https://lawncollection.pk/so-kamal-women-summer-collection-mustard-lawn-1pc-unstitched-shirt-dpl19-49-la00964-std-mst.html https:  lawncollection.pk so kamal women summer collection mustard lawn 1pc unstitched shirt dpl19 49 la00964 std mst.html so kamal so kamal women summer collection mustard lawn 1pc -unstitched shirt dpl19 49 la00964-std-mst description   specifications of so kamal women summer collection mustard lawn 1pc  unstitched shirt dpl19 49 la00964 std mst brand so kamal sku 105972128_pk 1253666066 features 1pc  unstitched main material lawn season summer material family lawn what's in the box 1x 1pc unstitched suit", "Category1_list": "unstitched", "size": {"xs": false, "s": false, "m": false, "xl": false, "xxl": false}}}]

数据帧

    brand_name  designer    title   description dress_type  where_to_wear   color   stitched    season  price   ... source  fabric  gender  frontpic    backpic otherpics   details Category1_list  size    sku
0   Polo Ralph Lauren   Polo Ralph Lauren   Long Sleeve Knit Magic Fleece Sweatshirt    - Casual graphic print sweatshirt- Crew neckli...   sweatshirt      black   True        8544    ... https://www.zalora.com.ph/polo-ralph-lauren-lo...   cotton  man static.ph.zalora.net/p/polo-ralph-lauren-3175-...   static.ph.zalora.net/p/polo-ralph-lauren-3175-...   [static.ph.zalora.net/p/polo-ralph-lauren-3175...   https://www.zalora.com.ph/polo-ralph-lauren-lo...       {'xs': False, 's': False, 'm': False, 'xl': Fa...   NaN
1   Polo Ralph Lauren   Polo Ralph Lauren   Basic Mesh Polo Shirt   - Colour block polo shirt with brand print- Un...   shirt       red True        9265    ... https://www.zalora.com.ph/polo-ralph-lauren-ba...   cotton  man static.ph.zalora.net/p/polo-ralph-lauren-7554-...   static.ph.zalora.net/p/polo-ralph-lauren-7555-...   [static.ph.zalora.net/p/polo-ralph-lauren-7554...   https://www.zalora.com.ph/polo-ralph-lauren-ba...       {'xs': False, 's': False, 'm': False, 'xl': Fa...   NaN
2   MANGO Man   MANGO Man   Faux Shearling Denim Jacket - Denim jacket with wash detail- Collar neckli...   jacket      blue    True        4995    ... https://www.zalora.com.ph/mango-man-faux-shear...   denim   man static.ph.zalora.net/p/mango-man-9782-7201341-...   static.ph.zalora.net/p/mango-man-9783-7201341-...   [static.ph.zalora.net/p/mango-man-9782-7201341...   https://www.zalora.com.ph/mango-man-faux-shear...       {'xs': False, 's': False, 'm': False, 'xl': Fa...   NaN
3   Polo Ralph Lauren   Polo Ralph Lauren   Knit Magic Fleece Hoodie    - Embroidered front hoodie- Unlined- Hooded ne...               True        10598   ... https://www.zalora.com.ph/polo-ralph-lauren-kn...   cotton  man static.ph.zalora.net/p/polo-ralph-lauren-2320-...   static.ph.zalora.net/p/polo-ralph-lauren-2320-...   [static.ph.zalora.net/p/polo-ralph-lauren-2320...   https://www.zalora.com.ph/polo-ralph-lauren-kn...       {'xs': False, 's': True, 'm': True, 'xl': True...   NaN
4   MANGO Man   MANGO Man   Turtleneck Flecked Sweater  - Solid hue speckle-knit sweatshirt- High neck...   sweatshirt      brown   True        2995    ... https://www.zalora.com.ph/mango-man-turtleneck...   cotton  man static.ph.zalora.net/p/mango-man-1900-5990341-...   static.ph.zalora.net/p/mango-man-1900-5990341-...   [static.ph.zalora.net/p/mango-man-1900-5990341...   https://www.zalora.com.ph/mango-man-turtleneck...       {'xs': False, 's': False, 'm': False, 'xl': Fa...   NaN

Tags: httpsnetsostaticphsummershirtman
2条回答

为数据选择适当的可视化技术

在分类数据上,条形图比散点图更合适,因为您不需要x轴为数字

还要选择合适的算法。。。K-均值仅对连续变量有意义。将类别编码为k-means的整数是错误的。在你的例子中,k-means会假设英语和德语的平均值正好是法语

他们从我的卡西姆教授那里得到了答案,我认为这将帮助人们

brand1=pd.factorize(clothes_fac['brand_name'])
clothes_fac.brand_name=brand1[0]
clothes_fac.head(5)

这是将每个唯一值转换为某个整数的方法

相关问题 更多 >

    热门问题