我正在研究yelp数据集,对于企业来说,这是来自yelp_academic_dataset_business.json
的第一行json。后续行与此架构匹配:
{
"business_id":"0DI8Dt2PJp07XkVvIElIcQ",
"name":"Innovative Vapors",
"neighborhood":"",
"address":"227 E Baseline Rd, Ste J2",
"city":"Tempe",
"state":"AZ",
"postal_code":"85283",
"latitude":33.3782141,
"longitude":-111.936102,
"stars":4.5,
"review_count":17,
"is_open":0,
"attributes":[
"BikeParking: True",
"BusinessAcceptsBitcoin: False",
"BusinessAcceptsCreditCards: True",
"BusinessParking: {
'garage': False,
'street': False,
'validated': False,
'lot': True,
'valet': False
}",
"DogsAllowed: False",
"RestaurantsPriceRange2: 2",
"WheelchairAccessible: True"
],
"categories": [
"Tobacco Shops",
"Nightlife",
"Vape Shops",
"Shopping"
],
"hours":[
"Monday 11:0-21:0",
"Tuesday 11:0-21:0",
"Wednesday 11:0-21:0",
"Thursday 11:0-21:0",
"Friday 11:0-22:0",
"Saturday 10:0-22:0",
"Sunday 11:0-18:0"
],
"type":"business"
}
我尝试将json解析为csv,并使用pd.read_csv
导入csv,得到以下DF:
+---+-----------------------------------------------------------------+
|idx| attributes |
+---+-----------------------------------------------------------------+
| 0 | BikeParking: True, BusinessAcceptsBitcoin: False, |
| | BusinessAcceptsCreditCards: True, ,DogsAllowed: False, |
| | RestaurantsPriceRange2: 2, WheelchairAccessible: True, |
| | BusinessParking: {'garage': False, |
| | 'street': False, |
| | 'validated': False, |
| | 'lot': True, |
| | 'valet': False} |
+---+-----------------------------------------------------------------+
但我真正想要的是:
+----+-----------------------------------+-----------------------------------+
| id | attributes_BusinessParking_garage | attributes_BusinessParking_lot |
+----+-----------------------------------+-----------------------------------+
| 0 | 1 | 0 |
+----+-----------------------------------+-----------------------------------+
我知道有pd.get_dummies
,但由于单元格被视为字符串,所以我没有很好的扁平分类列
注意:为了简单起见,我在示例中没有显示更多的列。你知道吗
你试过用映射函数来分离属性吗。你知道吗
您可能需要初始化要清空字符串或任何数据类型的列,然后执行以下操作:
编辑
根据您更新的问题;您是否尝试过使用
pd.read_json
?你知道吗相关问题 更多 >
编程相关推荐