'how to covert a json to pandas dataframe when the value is completely in the string fomat
I am trying to convert the data from a json to dataframe. My son
{"data":"key=IAfpK, age=58, key=WNVdi, age=64, key=jp9zt, age=47, key=0Sr4C, age=68, key=CGEqo, age=76,
key=IxKVQ, age=79, key=eD221, age=29, key=XZbHV, age=32, key=k1SN5, age=88, key=4SCsU, age=65, key=q3kG6,
age=33, key=MGQpf, age=13, key=Kj6xW, age=14, key=tg2VM, age=30, key=WSnCU, age=24, key=f1Vvz, age=46, }
I want to create a data frame with key and age as columns. I have parsed the str and extracted key,value, created a dict and then converted to dataframe. I know that there are several inbuilt functions in pandas for making our life easier. Is there any such method or easier way to create a dataframe.
r = requests.get('https://coderbyte.com/api/challenges/json/age-counting')
input_str = (r.json()['data'])
input_str_split = input_str.split(',')
age_dict = {}
i = 0
while i < len(input_str_split) - 2:
key = input_str_split[i].split('=')[1]
value = input_str_split[i+1].split('=')[1]
age_dict[key] = value
i += 2
data = pd.DataFrame(age_dict.items(),columns = ['Item','Age'])
Solution 1:[1]
You can try list-conprehension
and then select every 2 elements using data[::2]
:
data = [x.split("=")[1] for x in input_str.split(", ")]
df = pd.DataFrame({"age": data[1::2], "key": data[::2]})
print(df)
# age key
# 0 58 IAfpK
# 1 64 WNVdi
# 2 47 jp9zt
# 3 68 0Sr4C
# 4 76 CGEqo
# .. .. ...
# 295 13 lRf1j
# 296 50 0iJGV
# 297 5 cFCfU
# 298 48 J8an1
# 299 5 dkSlj
Explanations:
- Split data to identify each element using
split
:input_str.split(", ")
- Explode each element to select value after
=
:[x.split("=")[1] for x in input_str.split(", ")]
- Create the dataframe by selecting every two elements:
df = pd.DataFrame({"age": data[1::2], "key": data[::2]})
Full illustration:
r = requests.get('https://coderbyte.com/api/challenges/json/age-counting')
input_str = r.json().get('data')
print(input_str.split(", "))
# ['key=IAfpK', 'age=58', 'key=WNVdi', 'age=64', ... 'key=dkSlj', 'age=5']
print([x.split("=") for x in input_str.split(", ")])
# [['key', 'IAfpK'], ['age', '58'], ['key', 'WNVdi'], ['age', '64'], ... , ['key', 'dkSlj'], ['age', '5']]
print([x.split("=")[1] for x in input_str.split(", ")])
# ['IAfpK', '58', 'WNVdi', '64', ..., 'dkSlj', '5']
data = [x.split("=")[1] for x in input_str.split(", ")]
print(data[1::2])
# ['58', '64', ... , '5']
df = pd.DataFrame({"age": data[1::2], "key": data[::2]})
print(df)
# age key
# 0 58 IAfpK
# 1 64 WNVdi
# 2 47 jp9zt
# 3 68 0Sr4C
# 4 76 CGEqo
# .. .. ...
# 295 13 lRf1j
# 296 50 0iJGV
# 297 5 cFCfU
# 298 48 J8an1
# 299 5 dkSlj
# [300 rows x 2 columns]
Solution 2:[2]
Here is a solution you can try out,
zip(split_[::2], split_[1::2])
would yield,
key=IAfpK age=58, key=WNVdi age=64 & so on..
import pandas as pd
split_ = data.split(",")
df = pd.DataFrame(
{"Item": i.split("=")[-1], "Age": j.split("=")[-1]}
for i, j in zip(split_[::2], split_[1::2])
)
Item Age
0 IAfpK 58
1 WNVdi 64
2 jp9zt 47
3 0Sr4C 68
...
...
Solution 3:[3]
Unfortunately, your output is wrong.
here is an answer.
import requests
import re
r = requests.get('https://coderbyte.com/api/challenges/json/age-counting')
input_str = (r.json()['data'])
input_str_split = input_str.split(', ')
key_pattern = re.compile("key\=.*")
age_pattern = re.compile("age\=.*")
key_list = [x[4:] for x in input_str_split if key_pattern.match(x)]
age_list = [x[4:] for x in input_str_split if age_pattern.match(x)]
data = pd.DataFrame({'Item': key_list, 'Age': age_list})
output is
Item Age
0 IAfpK 58
1 WNVdi 64
2 jp9zt 47
3 0Sr4C 68
4 CGEqo 76
.. ... ..
295 lRf1j 13
296 0iJGV 50
297 cFCfU 5
298 J8an1 48
299 dkSlj 5
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | sushanth |
Solution 3 | GH KIM |