'Filter jsonb results in subselect
I'm building a hierarchical JSON result from several tables. These are just examples but should be sufficient for the purpose of this demonstration to get the idea:
CREATE TABLE book (
id INTEGER PRIMARY KEY NOT NULL,
data JSONB
);
CREATE TABLE author (
id INTEGER PRIMARY KEY NOT NULL,
data JSONB
);
CREATE TABLE book_author (
id INTEGER PRIMARY KEY NOT NULL,
author_id INTEGER,
book_id INTEGER
);
CREATE UNIQUE INDEX pk_unique ON book_author (author_id, book_id);
Test data:
INSERT INTO book (id, data) VALUES
(1, '{"pages": 432, "title": "2001: A Space Odyssey"}')
, (2, '{"pages": 300, "title": "The City And The City"}')
, (3, '{"pages": 143, "title": "Unknown Book"}');
INSERT INTO author (id, data) VALUES
(1, '{"age": 90, "name": "Arthur C. Clarke"}')
, (2, '{"age": 43, "name": "China Miéville"}');
INSERT INTO book_author (id, author_id, book_id) VALUES
(1, 1, 1)
, (2, 1, 2);
I've created the following function:
CREATE OR REPLACE FUNCTION public.book_get()
RETURNS json AS
$BODY$
DECLARE
result json;
BEGIN
SELECT to_json(array_agg(_b)) INTO result
FROM (
SELECT
book.id id,
book.data->>'title' title,
book.data->>'pages' pages,
(
SELECT to_json(array_agg(_a))
FROM (
SELECT
author.id id,
author.data->>'name' "name",
author.data->>'age' age
FROM
author, book_author ba
WHERE
ba.author_id = author.id AND
ba.book_id = book.id
ORDER BY id
) _a
) authors
FROM
book
ORDER BY id ASC
) _b;
RETURN result;
END;
$BODY$ LANGUAGE plpgsql VOLATILE;
Executing the function book_get
SELECT book_get();
Produces the following results
[
{
"id":1,
"title":"2001: A Space Odyssey",
"pages":432,
"authors":[
{
"id":1,
"name":"Arthur C. Clarke",
"age":90
}
]
},
{
"id":2,
"title":"The City And The City",
"pages":300,
"authors":[
{
"id":2,
"name":"China Miéville",
"age":43
}
]
},
{
"id":3,
"title":"Unknown Book",
"pages":143,
"authors":null
}
]
Now I'm able to filter the data with a WHERE
clause, e.g.
SELECT to_json(array_agg(_b)) INTO result
FROM (
...
) _b
-- give me the book with id 1
WHERE _b.id = 1;
-- or give me all titles with the occurrence of 'City' anywhere
WHERE _b.title LIKE '%City%';
-- or has more than 200 pages
WHERE _b.pages > 200;
How would I make it possible to filter on authors
? E.g. something equivalent to WHERE _b.authors.'name' = 'Arthur C. Clarke'
.
I have absolutely no idea what type authors
becomes? Or is? Is it still a record (array)? Is it JSON already? I guess because I can access id
, title
and pages
accessing _b.authors
isn't such a problem?
Accessing _b.authors
gives me ERROR: missing FROM-clause entry for table "authors"
Accessing with JSON operators _b.authors->>..
or _b->authors->>..
gives me
operator does not exist: record -> json Hint: No operator matches the given name and argument type(s). You might need to add explicit type casts.
I remember using GROUP BY
with the HAVING
clause:
GROUP BY _b.authors
HAVING _b.authors->>'name' = 'Arthur C. Clarke';
But it gives me the error:
ERROR: could not identify an equality operator for type json
To make it a bit more clear:
SELECT to_json(array_agg(_b)) INTO result
FROM (
...
) _b
WHERE _b.authors->0->>'name' = 'Arthur C. Clarke';
Will basically do what I need, this only matches if the author on index 0
is Arthur C. Clarke
. If he has cowritten the book and he would be on second place (index 1) then there wouldn't have been a match. So what I try to find is the correct syntax on scanning _b.authors
which happens to be an JSON array filled with authors. It just doesn't accept any attempt. As far as I understand are the @>
and #>
only supported for JSONB
. So how do I get the correct syntax on selecting _b.authors
on any column against a value.
Update 2
Ok read the docs again ... it seems I didn't get the part from Postgres documentation that there is a difference between JSON and JSONB regarding functions, I thought it was only concerning the data type. Replacing to_json
with to_jsonb
seems to do the trick with using operators like @>
etc in the where clause.
Update 3
@ErwinBrandstetter: Makes sense. LATERAL wasn't known to me yet, good to know it exists. I got the hang on JSON/JSONB's functions and operators, makes a lot of sense to me now. What isn't clear to me is finding occurrences with LIKE
for example in the WHERE
clause.
If I need to use jsonb_array_elements
to unnest objects in an array (because in the final WHERE
clause, the content of b.authors
is of JSONB datatype). I could then do
SELECT * FROM jsonb_array_elements('[
{"age": 90, "name": "the Arthur C. Clarke"},
{"age": 43, "name": "China Miéville"},
{"age": null, "name": "Erwin the Brandstetter"}
]'::jsonb) author
WHERE
author->>'name' LIKE '%the%';
and get the desired result,
1: {"age": 90, "name": "the Arthur C. Clarke"}
2: {"age": null, "name": "Erwin the Brandstetter"}
but whats the approach to achieve this in the final (last) WHERE
clause in my example? Pointing out the final WHERE
clause because I'd like to filter the complete set of results, and not partially filter somewhere in the middle of a sub-select. So in general I would like to filter out books in the final result set that have an author with a middle name 'C.' or a first name 'Arthur'.
Update 4
In the FROM
clause of course. I'll have to do performance tweaking at the end when I figured out all possibilities, but this is what I came up with.
SELECT json_agg(_b) INTO result
FROM (
...
) _b,
jsonb_array_elements(_b.authors) AS arrauthors
WHERE arrauthors->>'name' LIKE 'Arthur %';
Will give all books with an author name starting with 'Arthur'. I still appreciate comments or updates to this approach.
Solution 1:[1]
How would I make it possible to filter on authors? E.g. something equivalent to
WHERE _b.authors.'name' = 'Arthur C. Clarke'
.
You are on the right track in your question updates with jsonb
and the "contains" operator @>
. The best approach depends on what and how you want to filter exactly.
Basic function
Your basic function can be simpler:
CREATE OR REPLACE FUNCTION public.book_get()
RETURNS jsonb
LANGUAGE sql STABLE AS
$func$
SELECT jsonb_agg(books)
FROM (
SELECT b.data || jsonb_build_object('id', b.id, 'authors', a.authors) AS books
FROM book b
LEFT JOIN ( -- LEFT JOIN to include books without authors
SELECT book_id, jsonb_agg(data_plus) AS authors
FROM (
SELECT ba.book_id, jsonb_set(a.data, '{id}', to_jsonb(a.id)) AS data_plus
FROM book_author ba
JOIN author a ON a.id = ba.author_id
ORDER BY ba.book_id, ba.author_id
) a0
GROUP BY 1
) a ON a.book_id = b.id
ORDER BY b.id
) b0
$func$;
Major points
- Make it SQL, simpler. No need for plpgsql.
- Make it
STABLE
. - Don't omit the keyword
AS
for column aliases. - Use
jsonb_agg()
If you just want to add the id
column as key to your data
, there are simpler ways:
With the new
jsonb_set()
in Postgres 9.5:jsonb_set(data, '{id}', to_jsonb(id))
This adds the object or updates the value of an existing object with the same key - the equivalent of an UPSERT in SQL. You can also constrain the operation to UPDATE only, see the manual.
I use this in the inner subquery to add a single key.Concatenate two
jsonb
values:b.data || jsonb_build_object('id', b.id, 'authors', a.authors)
Again, existing keys on the same level in the left value are replaced with keys in the right value. I build the object with jsonb_build_object()
. See:
I use this in the outer subquery, simpler to add multiple keys. (And to demonstrate both options.
Your original query converted all values to text
, which is probably not intended. This query preserves original data types for all jsonb
values.
Test result
To test the result of your function for the existence of the author:
SELECT public.book_get() @> '[{"authors": [{"name":"Arthur C. Clarke"}]}]';
You have match the JSON structure in the pattern. And it only works for an exact match.
Or you can use jsonb_array_elements()
like you added in your last update for partial matches.
Either method is expensive since you test after building a JSON document from three whole tables.
Filter first
To actually filter books that have (possibly among others!) the given author, adapt your underlying query. You ask to filter books that ...
have an author with a middle name 'C.' or a first name 'Arthur'.
SELECT jsonb_agg(b.data || jsonb_build_object('id', b.id, 'authors', a.authors) ORDER BY b.id) AS books
FROM book b
, LATERAL ( -- CROSS JOIN since we filter before the join
SELECT jsonb_agg(jsonb_set(a.data, '{id}', to_jsonb(a.id)) ORDER BY a.id) AS authors
FROM book_author ba
JOIN author a ON a.id = ba.author_id
WHERE ba.book_id = b.id
) a
WHERE EXISTS (
SELECT 1 -- one of the authors matches
FROM book_author ba
JOIN author a ON a.id = ba.author_id
WHERE ba.book_id = b.id
AND (a.data->>'name' LIKE '% C. %' OR -- middle name 'C.'
a.data->>'name' LIKE 'Arthur %') -- or a first name 'Arthur'.
);
Filter books that have at least one matching author before you build the result.
Note how I use ORDER BY
as modifier to the jsob_agg()
aggregate function instead of a subquery to sort results like in the previous example. This is typically slower but shorter. And good enough for a small result set. Consider:
If your tables are big and you need the queries fast, use indexes! For this particular query a function trigram GIN index like this should work wonders for big tables:
CREATE INDEX author_special_idx ON author USING gin ((data->>'name') gin_trgm_ops);
Detailed explanation / instructions:
Solution 2:[2]
A nice tutorial on JSOn in postgresql is recommended. If you create your data in this way:
CREATE TABLE json_test (
id serial primary key,
data jsonb
);
INSERT INTO json_test (data) VALUES
('{"id":1,"title":"2001: A Space Odyssey","pages":432,"authors":[{"id":1,"fullname":"Arthur C. Clarke"}]}'),
('{"id":2,"title":"The City And The City","pages":300,"authors":[{"id":2,"fullname":"China Miéville"}]}'),
('{"id":3,"title":"Unknown Book","pages":143,"authors":null}');
You can select with a specific id
SELECT * FROM json_test
WHERE data @> '{"id":2}';
Or look for a specific name in a sub-array:
SELECT * FROM json_test
WHERE data -> 'authors' @> '[{"fullname": "Arthur C. Clarke"}]'
Or find book with over 200 pages:
SELECT * FROM json_test
WHERE (data -> 'pages')::text::int > 200
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Joost Döbken |