'Spring JPA Paging to Stream

We are currently using cockroach DB, but this doesn't support returning X records at a time presumably due to lack of cursor support. This means that when trying to stream a large number (~10 million) of records a full ResultSet is returned by the DB, causing the app to fall over due to running out of memory.

Cockroach recommends using Pagination (ideally keyset) for retrieving large numbers of results, but is there a nice way of reading all pages and returning a Stream, without loading all results into memory at any point?

Thanks!



Solution 1:[1]

As long as you don't have multiple operations happening at the same time on the same connection, you should be able to set the JDBC fetchSize property and control how many results are returned.

Solution 2:[2]

As vendor documentation proposes...

The general pattern for keyset pagination queries is:

SELECT * FROM t AS OF SYSTEM TIME ${time}
WHERE key > ${value}
ORDER BY key
LIMIT ${amount}

This is faster than using LIMIT/OFFSET because, instead of doing a full table scan up to the value of the OFFSET, a keyset pagination query looks at a fixed-size set of records for each iteration. This can be done quickly provided that the key used in the WHERE clause to implement the pagination is indexed and unique. A primary key meets both of these criteria.

Note: CockroachDB does not have cursors. To support a cursor-like use case, namely "operate on a snapshot of the database at the moment the cursor is opened", use the AS OF SYSTEM TIME clause as shown in the examples below.

It means or the user (interface) provides or "we" store:

  • $value, which refers to the "last seen key" per pagination request(/session!)

additionlly, if we want that "cursor like" behavior, we need to provide/store:

  • ${time}, which refers to the last request(/session) pagination timestamp.

When we provide the additional parameters, we can do it all in one repository query:

public interface SomeRepository extends JpaRepository<Some, Long> {

    @Query(
      value = """ 
              SELECT * FROM SOME [, ....]                
              AS OF SYSTEM TIME follower_read_timestamp()
              [JOIN ...]                                
              WHERE [... AND] some.ID > :lastKey
              ORDER BY some.ID
              """          
      // If you want/need the count
      countQuery = "SELECT count(*) FROM <[SAME_QUERY]>",
      nativeQuery = true)
      // Pageable if you  know the total size, Slice otherwise ;)
      Pageable<Some> findCustom(/* more paramas?, */
        @Param("lastKey") Long lastKey, Pageable pageable);
      // done(?;)
}

... and maintain discreet hope, that only pageable.pageSize items will be fetched (from db & loaded to memory).

Some Refs:

Solution 3:[3]

I'm following what xerx593 mentioned above and agree with his solution. Just to enhance a code and leverage a Spring data JPA feature, we can use "CockroachDB Follower Reads with Spring data JPA and AOP" This is helpful, I have used in my project..

Steps-

  1. Create annotation FollowerRead
  2. Create Aspect using FollowerRead
  3. Use new annotation with default JPA get/fetch methods

Read --> (Not my blog full credit to writer) https://tvsguide.io/cockroachdb-follower-reads-with-spring-and-aop

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 rafiss
Solution 2
Solution 3 drt