'Any way to make removing value from JSON quicker?

I have a JSON file in which I am attempting to remove values in a separate list from this JSON file.

The list is as follows, and is approximately 10,000 values long:

remove_list=["apple","orange","banana"...]

The current code I have to remove these values from the JSON is:

for item in remove_list:
    for i in range(len(JSON_file)):
        for j in range(len(JSON_file[i]['query']['items'])):

            if item in JSON_file[i]['query']['items'][j]['match']:
                del JSON_file[i]['query']['items'][j]['match']
                    
            else:
                pass

Is there a way to make this process more efficient? It currently takes about 100 minutes to do this.



Solution 1:[1]

You can speed it up significantly by using a set instead of a list for the blacklist.

remove_set = set(remove_list)
for foo in JSON_file:
   for item in foo['query']['items']:
      if not remove_set.isdisjoint(item['match']):
          del item['match']

According to this answer, set.isdisjoint() should be the faster way to check if any of the blacklisted words can be found in the "match" list. (assuming it's a list)

https://stackoverflow.com/a/45113088/1977847

If "match" is a string, you can split it into a list of words.

import re
remove_set.isdisjoint(re.split(r'\W+', item['match']))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1