'T-SQL - search in filtered JSON array

SQL Server 2017.

Table OrderData has column DataProperties where JSON is stored. JSON example stored there:

{
 "Input": {
   "OrderId": "abc",
   "Data": [
     {
       "Key": "Files",
       "Value": [
         "test.txt",
         "whatever.jpg"
       ]
     },
     {
       "Key": "Other",
       "Value": [
         "a"
       ]
     }
   ]
 }
}

So, it's an object with Input object, which has Data array that's KVP - full of objects with Key string and Value array of strings.

And my problem - I need to query for rows based on values in Files in example JSON - simple LIKE that matches %text%.

This query works:

SELECT TOP 10 *
FROM OrderData CROSS APPLY OPENJSON(DataProperties,'$.Input.Data') dat
WHERE JSON_VALUE(dat.value, '$.Key') = 'Files' and dat.[key] = 0 
AND JSON_QUERY(dat.value, '$.Value') LIKE '%2%'

Problem is that this query is very slow, unsurprisingly.

How to make it faster?

  1. I cannot create computed column with JSON_VALUE, because I need to filter in an array.
  2. I cannot create computed column with JSON_QUERY on "$.Input.Data" or "$.Input.Data[0].Values" - because I need specific array item in this array with Key == "Files".

I've searched, but it seems that you cannot create computed column that also filters data, like with this attempt:

ALTER TABLE OrderData
 ADD aaaTest AS (select JSON_QUERY(dat.value, '$.Value')
 OPENJSON(DataProperties,'$.Input.Data') dat
WHERE JSON_VALUE(dat.value, '$.Key') = 'Files' and dat.[key] = 0 );

Error: Subqueries are not allowed in this context. Only scalar expressions are allowed.


What are my options?

  1. Add Files column with an index and use INSERT/UPDATE triggers that populate this column on inserts/updates?
  2. Create a view that "computes" this column? Can't add index, will still be slow

So far only option 1. has some merit, but I don't like triggers and maybe there's another option?



Solution 1:[1]

You might try something along this:

Attention: I've added a 2 to the text2 to fullfill your filter. And I named both to the plural "Values":

DECLARE @mockupTable TABLE(ID INT IDENTITY, DataProperties NVARCHAR(MAX));
INSERT INTO @mockupTable VALUES
(N'{
 "Input": {
   "OrderId": "abc",
   "Data": [
     {
       "Key": "Files",
       "Values": [
         "test2.txt",
         "whatever.jpg"
       ]
     },
     {
       "Key": "Other",
       "Values": [
         "a"
       ]
     }
   ]
 }
}');

The query

SELECT TOP 10 *
FROM @mockupTable t 
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
    ,[Values] NVARCHAR(MAX) AS JSON) dat
WHERE dat.[Key] = 'Files'
AND dat.[Values] LIKE '%2%';

The main difference is the WITH-clause, which is used to return the properties inside an object in a typed way and side-by-side (similar to a naked OPENJSON with a PIVOT for all columns - but much better). This avoids expensive JSON methods in your WHERE...

Hint: As we return the Value with NVARCHAR(MAX) AS JSON we can continue with the nested array and might proceed with something like this:

SELECT TOP 10 *
FROM @mockupTable t 
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
    ,[Values] NVARCHAR(MAX) AS JSON) dat
WHERE dat.[Key] = 'Files'
--we read the array again with `OPENJSON`:
AND 'test2.txt' IN(SELECT [Value] FROM OPENJSON(dat.[Values]));

You might use one more CROSS APPLY to add the array's values and filter this at the WHERE directly.

SELECT TOP 10 *
FROM @mockupTable t 
CROSS APPLY OPENJSON(t.DataProperties,'$.Input.Data')
WITH([Key] NVARCHAR(100)
    ,[Values] NVARCHAR(MAX) AS JSON) dat
CROSS APPLY OPENJSON(dat.[Values]) vals
WHERE dat.[Key] = 'Files'
  AND vals.[Value]='test2.txt'

Just check it out...

Solution 2:[2]

This is an old question, but I would like to revisit it. There isn't any mention of how the source table is actually constructed in terms of indexing. If the original author is still around, can you confirm/deny what indexing strategy you used? For performant json document queries, I've found that having a table using the COLUMSTORE indexing strategy yields very performant JSON queries even with large amounts of data.

https://docs.microsoft.com/en-us/sql/relational-databases/json/store-json-documents-in-sql-tables?view=sql-server-ver15 has an example of different indexing techniques. For my personal solution I've been using COLUMSTORE albeit on a limited NVARCAHR document size. It's fast enough for any purposes I have even under millions of rows of decently sized json documents.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Vincent Polite