如何使用Pandas将多个嵌套值转换为分类变量?

2024-05-13 01:57:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在研究yelp数据集,对于企业来说,这是来自yelp_academic_dataset_business.json的第一行json。后续行与此架构匹配:

{
  "business_id":"0DI8Dt2PJp07XkVvIElIcQ",
  "name":"Innovative Vapors",
  "neighborhood":"",
  "address":"227 E Baseline Rd, Ste J2",
  "city":"Tempe",
  "state":"AZ",
  "postal_code":"85283",
  "latitude":33.3782141,
  "longitude":-111.936102,
  "stars":4.5,
  "review_count":17,
  "is_open":0,
  "attributes":[
    "BikeParking: True",
    "BusinessAcceptsBitcoin: False",
    "BusinessAcceptsCreditCards: True",
    "BusinessParking: {
      'garage': False,
      'street': False,
      'validated': False,
      'lot': True,
      'valet': False
    }",
    "DogsAllowed: False",
    "RestaurantsPriceRange2: 2",
    "WheelchairAccessible: True"
  ],
  "categories": [
    "Tobacco Shops",
    "Nightlife",
    "Vape Shops",
    "Shopping"
  ],
  "hours":[
    "Monday 11:0-21:0",
    "Tuesday 11:0-21:0",
    "Wednesday 11:0-21:0",
    "Thursday 11:0-21:0",
    "Friday 11:0-22:0",
    "Saturday 10:0-22:0",
    "Sunday 11:0-18:0"
  ],
  "type":"business"
}

我尝试将json解析为csv,并使用pd.read_csv导入csv,得到以下DF:

+---+-----------------------------------------------------------------+
|idx|                     attributes                                  |
+---+-----------------------------------------------------------------+
| 0 | BikeParking: True, BusinessAcceptsBitcoin: False,               |
|   | BusinessAcceptsCreditCards: True, ,DogsAllowed: False,          |
|   | RestaurantsPriceRange2: 2, WheelchairAccessible: True,          |
|   | BusinessParking: {'garage': False,                              |
|   |                   'street': False,                              |
|   |                   'validated': False,                           |
|   |                   'lot': True,                                  |
|   |                   'valet': False}                               |
+---+-----------------------------------------------------------------+

但我真正想要的是:

+----+-----------------------------------+-----------------------------------+
| id | attributes_BusinessParking_garage | attributes_BusinessParking_lot    |
+----+-----------------------------------+-----------------------------------+
|  0 |                  1                |                0                  |
+----+-----------------------------------+-----------------------------------+

我知道有pd.get_dummies,但由于单元格被视为字符串,所以我没有很好的扁平分类列

注意:为了简单起见,我在示例中没有显示更多的列。你知道吗


Tags: csvidjsonfalsetruestreetbusinessattributes
1条回答
网友
1楼 · 发布于 2024-05-13 01:57:36

你试过用映射函数来分离属性吗。你知道吗

您可能需要初始化要清空字符串或任何数据类型的列,然后执行以下操作:

def split_attributes (row):
    for k, v in row[0].items():
        row[k] = v
df = df.apply(split_attributes)

编辑

根据您更新的问题;您是否尝试过使用pd.read_json?你知道吗

相关问题 更多 >