'How to force COMMIT inside function so other sessions can see updated row?
In a Postgres 12 database, I have multiple queries (SELECT
, UPDATE
, ...) in a function that all together take about 20 minutes to complete.
I have a check at the top that does an UPDATE
if status
is not running:
create or replace function aaa.fnc_work() returns varchar as
$body$
begin
if (select count(*) from aaa.monitor where id='invoicing' and status='running')=0 then
return 'running';
else
update aaa.monitor set status='running' where id='invoicing';
end if;
--- rest of code ---
--finally
update aaa.monitor set status='idle' where id='invoicing';
return '';
exception when others then
return SQLERRM::varchar;
end
$body$
language plpgsql;
The idea is to prevent other users from executing the --- rest of code ---
until status
is idle.
However, it seems the updated status is not seen by others (calling the same function) who also go ahead and start executing --- rest of code ---
. How do I force a commit after:
update aaa.monitor set status='running'
where id='invoicing'
;
So that all other user sessions can see the updated status
and exit accordingly.
Do I need a transaction?
Solution 1:[1]
Keep reading. I preserved the best for last.
Proof of concept with a PROCEDURE
A Postgres FUNCTION
is always atomic (runs inside a single transaction wrapper) and cannot handle transactions. So COMMIT
is disallowed. You could use tricks with dblink
to work around this. See:
- Does Postgres support nested or autonomous transactions?
- How do I do large non-blocking updates in PostgreSQL?
But for nested transactions like this consider a PROCEDURE
instead. Introduced with Postgres 11. There you can manage transactions:
CREATE OR REPLACE PROCEDURE aaa.proc_work(_id text, INOUT _result text = NULL)
LANGUAGE plpgsql AS
$proc$
BEGIN
-- optionally assert steering row exists
PERFORM FROM aaa.monitor WHERE id = _id FOR KEY SHARE SKIP LOCKED;
IF NOT FOUND THEN
RAISE EXCEPTION 'monitor.id = % not found or blocked!', quote_literal(_id);
END IF;
-- try UPDATE
UPDATE aaa.monitor
SET status = 'running'
WHERE id = _id -- assuming valid _id
AND status <> 'running'; -- assuming "status" is NOT NULL
IF NOT FOUND THEN
_result := 'running'; RETURN; -- this is how you return with INOUT params
END IF;
COMMIT; -- HERE !!!
BEGIN -- start new code block
----- code for big work HERE -----
-- PERFORM 1/0; -- debug: test exception?
-- PERFORM pg_sleep(5); -- debug: test concurrency?
_result := '';
-- also catching QUERY_CANCELED and ASSERT_FAILURE
-- is a radical step to try andrelease 'running' rows no matter what
EXCEPTION WHEN OTHERS OR QUERY_CANCELED OR ASSERT_FAILURE THEN
-- ROLLBACK; -- roll back (unfinished?) big work
_result := SQLERRM;
END; -- end of nested block
UPDATE aaa.monitor -- final reset
SET status = 'idle'
WHERE id = _id
AND status <> 'idle'; -- only if needed
END
$proc$;
Call (important!):
CALL aaa.proc_work('invoicing'); -- stand-alone call!
Important notes
Add COMMIT
after the UPDATE
. After that, concurrent transactions can see the updated row.
But there is no additional BEGIN
or START TRANSACTION
. The manual:
In procedures invoked by the
CALL
command as well as in anonymous code blocks (DO
command), it is possible to end transactions using the commandsCOMMIT
andROLLBACK
. A new transaction is started automatically after a transaction is ended using these commands, so there is no separateSTART TRANSACTION
command. (Note thatBEGIN
andEND
have different meanings in PL/pgSQL.)
We need a separate PL/pgSQL code block, because you have a custom exception handler, and (quoting the manual):
A transaction cannot be ended inside a block with exception handlers.
(But we can COMMIT
/ ROLLBACK
in the EXCEPTION
handler.)
You cannot call this procedure inside an outer transaction, or together with any other DML statement, which would force an outer transaction wrapper. Has to be a stand-alone CALL
. See:
Note the final UPDATE aaa.monitor SET status = 'idle' WHERE ...
. Else the (committed!) status
would remain 'running' indefinitely after an exception.
About returning a value from a procedure:
I added DEFAULT NULL
to the INOUT
parameter, so you don't have to provide an argument with the call.
UPDATE
directly. If the row is 'running', no update occurs. (This also fixes the logic: your IF
expression seems backwards as it returns 'running' when no row with status='running'
is found. Seems like you'd want the opposite.)
I added an (optional!) assert to make sure the row in table aaa.monitor
exists. Adding a FOR KEY SHARE
lock to also eliminate the tiny time window for a race conditions between the assert and the following UPDATE
. The lock conflicts with deletion or updating the PK column - but not with updating the status
. So the exception is never raised in normal operation! The manual:
Currently, the set of columns considered for the
UPDATE
case are those that have a unique index on them that can be used in a foreign key (so partial indexes and expressional indexes are not considered), but this may change in the future.
SKIP LOCK
to not wait in case of a conflicting lock. The added exception should never occur. Just demonstrating a water-tight proof of concept.
Your update revealed 25 rows in aaa.monitor
, so I added the parameter _id
.
Superior approach
The above might make sense to persist more information for the world to see. To just queue operations, there are much more efficient solutions. Work with a lock instead, which is "visible" to others instantaneously. Then you don't need a nested transaction to begin with, and a plain FUNCTION
will do:
CREATE OR REPLACE FUNCTION aaa.fnc_work(_id text)
RETURNS text
LANGUAGE plpgsql AS
$func$
BEGIN
-- optionally assert that the steering row exists
PERFORM FROM aaa.monitor WHERE id = _id FOR KEY SHARE SKIP LOCKED;
IF NOT FOUND THEN
RAISE EXCEPTION 'monitor.id = % not found or blocked!', quote_literal(_id);
END IF;
-- lock row
PERFORM FROM aaa.monitor WHERE id = _id FOR NO KEY UPDATE SKIP LOCKED;
IF NOT FOUND THEN
-- we made sure the row exists, so it must be locked
RETURN 'running';
END IF;
----- code for big work HERE -----
-- PERFORM 1/0; -- debug: test exception?
-- PERFORM pg_sleep(5); -- debug: test concurrency?
RETURN '';
EXCEPTION WHEN OTHERS THEN
RETURN SQLERRM;
END
$func$;
Call:
SELECT aaa.fnc_work('invoicing');
The call can be nested any way you want. As long as one transaction is working on the big job, no other will start.
Again, the optional assert takes out a FOR KEY SHARE
lock to eliminate the time window for a race condition, and the added exception should never occur in normal operation.
We don't need the column status
at all for this. The row-lock itself is the gatekeeper. Hence the empty SELECT
list in PERFORM FROM aaa.monitor ...
. Collateral benefit: this also doesn't produce dead tuples by updating the row back and forth. If you still need to update status
for some other reason, you are back to the visibility issue of the previous chapter. You can combine both ...
About PERFORM
:
About the row lock:
Solution 2:[2]
What you are trying to accomplish is an Autonomous Transaction. PostgreSQL doesn't have an easy way to do this. This link here discusses some alternatives.
But one way, which is discussed in the article linked aboove is to use the PostgreSQL dblink extension.
You will need to add the extension to the server
CREATE EXTENSION dblink;
and then you can create a new function that is called from within your function
CREATE FUNCTION update_monitor_via_dblink(msg text)
RETURNS void
LANGUAGE sql
AS $function$
select dblink('host=/var/run/postgresql port=5432 user=postgres dbname=postgres',
format(' update aaa.monitor set status= %M',msg::text)
$function$;
Another thing you might want to consider is using PostgreSQL locks. More information can be found here.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Ed Mendez |