'how to unload only 1000 rows from redshift tables

I am creating a copy of a production redshift database at a development level. I know how to unload data from my production instance/cluster to s3, then copy that data into my development instance/cluster, but only if I unload all the data at once. What I would like to do instead, is to copy just 1000 or so rows from each of my tables, to cut down on space and transfer time between my redshift instances.

e.g.

UNLOAD ('SELECT * FROM myschema.mytable LIMIT 1000') TO 's3://my-bucket' CREDENTIALS etcetcetc

Is there a way to do this LIMIT with UNLOAD, or am I going to have to switch to a bulk-insert-style paradigm?

EDIT: I am programmatically unloading and copying a bunch of tables, so I don't want to hard code in any key-based limits in case we add new tables or change table structures, etc.



Solution 1:[1]

While "LIMIT" is not part of the actual "UNLOAD" command, the Redshift documentation on UNLOAD provides a few alternatives:

Limit Clause

The SELECT query cannot use a LIMIT clause in the outer SELECT. For example, the following UNLOAD statement will fail:

unload ('select * from venue limit 10') 
to 's3://mybucket/venue_pipe_' credentials 
'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>'; 

Instead, use a nested LIMIT clause. For example:

unload ('select * from venue where venueid in 
(select venueid from venue order by venueid desc limit 10)') 
to 's3://mybucket/venue_pipe_' credentials 
'aws_access_key_id=<access-key-id>;aws_secret_access_key=<secret-access-key>';

Alternatively, you could populate a table using SELECT…INTO or CREATE TABLE AS using a LIMIT clause, then unload from that table.

Solution 2:[2]

If your table is created with distribution style (except "All" distribution style), then there is no need of limit concept.

Say if you have created your table with distribution style "even" (which is the default distribution style) and you have 4 different child node, then while unloading, totally 4 files will be generated per table in Amazon S3.

Solution 3:[3]

It is possible to add the LIMIT clause when you unload from a Redshift table by adding SELECT * FROM () over your LIMIT query.

Example with your case :

UNLOAD ('SELECT * FROM myschema.mytable LIMIT 1000') 
TO 's3://my-bucket' CREDENTIALS etcetcetc

Becomes

UNLOAD ('SELECT * FROM 
                       (SELECT * FROM myschema.mytable LIMIT 1000)'
       ) 
TO 's3://my-bucket' CREDENTIALS etcetcetc

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Mark Hildreth
Solution 2 Viki888
Solution 3 Rxzh