'optimal way to stream trades data out of a postgres database
I have a table with a very simple schema:
(
instrument varchar(20) not null,
ts timestamp not null,
price double precision not null,
quantity double precision not null,
direction integer not null,
id serial
constraint trades_pkey
primary key
);
It stores a list of trades done on various instruments.
You can have multiple trades on a single timestamp and also the timestamps are not regular; it's possible to have 10 entries on the same millisecond and then nothing for 2 seconds, etc.
When the client starts, I would like to accomplish two things:
- Load the last hour of data.
- Stream all the new updates.
The client processes the trades one by one, as if they were coming from a queue. They are sorted by instrument and each instrument has its own queue, expecting each trade to be the one following the previous one.
Solution A: I did a query to find the id at now - 1hour, and then query all rows with id >= start id, and then loop to get all row with id > last id.
This does not work: the row id and timestamps do not match, sometimes an older timestamp gets a higher row id, etc. I guess this is due to writes being done on multiple threads, but getting data by id doesn't guarantee I will get the trades in order and while I can sort one batch I receive, I can't be sure that the next batch will not contain an older row.
Solution B: I can make a query loop that takes the last timestamp received, subtracts 1 second and queries again, etc. I can sort the data in the client and, for each instrument, discard all rows older than the last one processed. Not very efficient, but that will work.
Solution C: I can make a query per instrument (there are 22 of them), ordered by timestamp. Can 22 subqueries be grouped into a single one?
Or, is there another solution?
Solution 1:[1]
You could try big serial with auto increment to ensure each row is numbered in order as it is inserted.
Since this number is handled by Postgres you should be fine to get a guaranteed ordering on your data.
On the client side you just store (maybe in a separate table of meta-data) the latest serial number you have seen and then just query everything larger than that and keep your meta data table up to date.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Richard Howell-Peak |