'Convert JSON format column to new columns
I have a sub-Yelp Dataset in csv, and attributes column is in json format. I'm trying to convert that column to new columns, but none of the relevant code on different question works for me.
Texts in the attributes column are in this format, in every row:
bus = pd.read_csv('public3-business.csv')
bus.iloc[10:12,8:12]
stars review_count is_open attributes
4.0 603 True {"WiFi": "u'no'", "HasTV": "True", "Caters": "False", "Alcohol": "u'beer_and_wine'", "Ambience": "{'romantic': False, 'intimate': False, 'classy': False, 'hipster': False, 'divey': False, 'touristy': False, 'trendy': False, 'upscale': False, 'casual': True}", "NoiseLevel": "u'average'", "BikeParking": "True", "GoodForKids": "True", "GoodForMeal": "{'dessert': False, 'latenight': False, 'lunch': True, 'dinner': True, 'brunch': False, 'breakfast': False}", "OutdoorSeating": "False", "BusinessParking": "{'garage': False, 'street': False, 'validated': False, 'lot': True, 'valet': False}", "RestaurantsAttire": "'casual'", "RestaurantsTakeOut": "True", "RestaurantsDelivery": "False", "RestaurantsPriceRange2": "2", "RestaurantsReservations": "False", "RestaurantsTableService": "True", "RestaurantsGoodForGroups": "True", "BusinessAcceptsCreditCards": "True"}
4.5 84 True {"WiFi": "u'no'", "HasTV": "False", "Music": "{'dj': False, 'background_music': True, 'jukebox': False, 'live': False, 'video': False, 'karaoke': False}", "Caters": "False", "Alcohol": "u'beer_and_wine'", "Smoking": "u'no'", "Ambience": "{'romantic': False, 'intimate': False, 'classy': False, 'hipster': False, 'divey': False, 'touristy': False, 'trendy': False, 'upscale': False, 'casual': True}", "CoatCheck": "False", "HappyHour": "True", "BestNights": "{'monday': False, 'tuesday': False, 'friday': True, 'wednesday': False, 'thursday': True, 'sunday': False, 'saturday': True}", "NoiseLevel": "u'average'", "BikeParking": "True", "DogsAllowed": "False", "GoodForKids": "True", "GoodForMeal": "{'dessert': False, 'latenight': True, 'lunch': False, 'dinner': False, 'brunch': False, 'breakfast': False}", "GoodForDancing": "False", "OutdoorSeating": "False", "BusinessParking": "{'garage': False, 'street': True, 'validated': False, 'lot': False, 'valet': False}", "RestaurantsAttire": "u'casual'", "RestaurantsTakeOut": "True", "RestaurantsDelivery": "False", "WheelchairAccessible": "True", "BusinessAcceptsBitcoin": "False", "RestaurantsPriceRange2": "1", "RestaurantsReservations": "False", "RestaurantsTableService": "True", "RestaurantsGoodForGroups": "True", "BusinessAcceptsCreditCards": "True"}
or, as plain text:
'{"WiFi": "u'no'", "HasTV": "False", "Caters": "False", "Alcohol": "u'full_bar'", "Ambience": "{'touristy': False, 'hipster': False, 'romantic': True, 'divey': False, 'intimate': False, 'trendy': False, 'upscale': False, 'classy': False, 'casual': False}", "NoiseLevel": "u'average'", "BikeParking": "False", "GoodForKids": "False", "GoodForMeal": "{'dessert': False, 'latenight': False, 'lunch': False, 'dinner': True, 'brunch': False, 'breakfast': False}", "OutdoorSeating": "True", "BusinessParking": "{'garage': True, 'street': False, 'validated': False, 'lot': False, 'valet': True}", "RestaurantsAttire": "u'dressy'", "RestaurantsTakeOut": "False", "RestaurantsDelivery": "False", "RestaurantsPriceRange2": "4", "RestaurantsReservations": "True", "RestaurantsTableService": "True", "RestaurantsGoodForGroups": "True", "BusinessAcceptsCreditCards": "True"}'
As seen, f.e, Ambiance attribute also has subattributes like romantic, intimate etc. I want to get columns for each attributes, like Wifi, HasTv, Ambianc.romantic etc.
stars review_count is_open HasTV Alcohol Ambiance.romantic Ambiance.intimate WiFi
2.5 13 False True False False True False
4.0 25 False False False False False u'free
Does anyone know how to make this?
Solution 1:[1]
This gets you a step closer to what you want:
d = {"WiFi": "u'no'", "HasTV": "False", "Caters": "False", "Alcohol": "u'full_bar'", "Ambience": "{'touristy': False, 'hipster': False, 'romantic': True, 'divey': False, 'intimate': False, 'trendy': False, 'upscale': False, 'classy': False, 'casual': False}", "NoiseLevel": "u'average'", "BikeParking": "False", "GoodForKids": "False", "GoodForMeal": "{'dessert': False, 'latenight': False, 'lunch': False, 'dinner': True, 'brunch': False, 'breakfast': False}", "OutdoorSeating": "True", "BusinessParking": "{'garage': True, 'street': False, 'validated': False, 'lot': False, 'valet': True}", "RestaurantsAttire": "u'dressy'", "RestaurantsTakeOut": "False", "RestaurantsDelivery": "False", "RestaurantsPriceRange2": "4", "RestaurantsReservations": "True", "RestaurantsTableService": "True", "RestaurantsGoodForGroups": "True", "BusinessAcceptsCreditCards": "True"}
pd.json_normalize(d)
Output:
WiFi HasTV Caters Alcohol Ambience ... RestaurantsPriceRange2 RestaurantsReservations RestaurantsTableService RestaurantsGoodForGroups BusinessAcceptsCreditCards
0 u'no' False False u'full_bar' {'touristy': False, 'hipster': False, 'romanti... ... 4 True True True True
[1 rows x 19 columns]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |