'How to fix missing double quotes issue when parsing JSON data?

I am running a piece of code in Python3 where I am consuming JSON data from the source. I don't have control over the source. While reading the json data I am getting following error:

simplejson.errors.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2

Here is the code


import logging
import simplejson as json
logging.basicConfig(level=logging.INFO)


consumer = KafkaConsumer(
        bootstrap_servers='localhost:9092',        
        api_version=(1,0,0))

consumer.subscribe(['Test_Topic-1'])

for message in consumer:
    msg_str=message.value    
    y = json.loads(msg_str)
    print(y["city_name"])

As I can not change the source, I need to fix it at my end. I found out this post helpful as my data contains the timestamps with : in it: How to Fix JSON Key Values without double-quotes?

But it also fails for some values in my json data as those values contain : in it. e.g.

address:"1600:3050:rf02:hf64:h000:0000:345e:d321"

Is there any way where I can add double quotes to keys in my json data?



Solution 1:[1]

You can try to use module dirtyjson - it can fix some mistakes.

import dirtyjson

d = dirtyjson.loads('{address:"1600:3050:rf02:hf64:h000:0000:345e:d321"}')

print( d['address'] )

d = dirtyjson.loads('{abc:"1:2:3:4", efg:"5:6:7:8", "hij":"foo"}')

print( d['abc'] )

It creates AttributedDict so it may need dict() to create normal dictionary

d = dirtyjson.loads('{abc:"1:2:3:4", efg:"5:6:7:8", "hij":"foo"}')

print( d )

print( dict(d) )

Result:

AttributedDict([('abc', '1:2:3:4'), ('efg', '5:6:7:8'), ('hij', 'foo')])

{'abc': '1:2:3:4', 'efg': '5:6:7:8', 'hij': 'foo'}

Solution 2:[2]

I think your problem is that you have strings like this:

{abc:"1:2:3:4", efg:"5:6:7:8", "hij":"foo"}

which are not valid JSON. You could try to repair it with a regular expression substitution:

import re
jtxt_bad ='{abc:"1:2:3:4", efg:"5:6:7:8", "hij":"foo", klm:"bar"\n}'
jtxt = re.sub(r'\b([a-zA-Z]+):("[^"]+"[,\n}])', r'"\1":\2', jtxt_bad)

print(f'Original: {jtxt_bad}\nRepaired: {jtxt}')

The output of this is:

Original: {abc:"1:2:3:4", efg:"5:6:7:8", "hij":"foo", klm:"bar"
}
Repaired: {"abc":"1:2:3:4", "efg":"5:6:7:8", "hij":"foo", "klm":"bar"
}

The regular expression \b([a-zA-Z]+):("[^"]+"[,\}]) means: boundary, followed by one or more letters, followed by a :, followed by double-quoted string, followed by one of ,, }, \n. However, this will fail if there is a quote inside the string, such as "1:\"2:3".

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 furas
Solution 2 Han-Kwang Nienhuys