'Any way to make removing value from JSON quicker?
I have a JSON file in which I am attempting to remove values in a separate list from this JSON file.
The list is as follows, and is approximately 10,000 values long:
remove_list=["apple","orange","banana"...]
The current code I have to remove these values from the JSON is:
for item in remove_list:
for i in range(len(JSON_file)):
for j in range(len(JSON_file[i]['query']['items'])):
if item in JSON_file[i]['query']['items'][j]['match']:
del JSON_file[i]['query']['items'][j]['match']
else:
pass
Is there a way to make this process more efficient? It currently takes about 100 minutes to do this.
Solution 1:[1]
You can speed it up significantly by using a set instead of a list for the blacklist.
remove_set = set(remove_list)
for foo in JSON_file:
for item in foo['query']['items']:
if not remove_set.isdisjoint(item['match']):
del item['match']
According to this answer, set.isdisjoint() should be the faster way to check if any of the blacklisted words can be found in the "match" list. (assuming it's a list)
https://stackoverflow.com/a/45113088/1977847
If "match" is a string, you can split it into a list of words.
import re
remove_set.isdisjoint(re.split(r'\W+', item['match']))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |