'Exporting a hana table having 220 million records using hana Export Command

I've a Hana table having 220 million records. (Size 12GB). We currently use Informatica which performs a select * on my_table and exports a pipe separated data in to file. This file size closens upto 100GB and takes around 1.5Hrs for the job to be executed. Can this be achieved by export command in hana ? And can export command be use to export the '|' separated table file to a different sever ?



Solution 1:[1]

I'm not familiar with Informatica at all but all the rest of your requirements are covered quite well by the EXPORT command. Of course you can let HANA produce such an output file for a table. If you look at:

SAP HANA SQL and System Views Reference - EXPORT Statement (Data Import Export)

you'll see that you can export as csv, which is pretty much a delimited text format file and and that you can choose the export target location, the separator and row delimiter.

The clauses from the documentation which you need are:

RECORD DELIMITED BY

FIELD DELIMITED BY

EXPORT INTO '/tmp/targetfolder/targetfile.csv' from "<schema>"."<table_name>" WITH FIELD DELIMITED BY '|' RECORD DELIMITED BY  ';';

This will produce a file with | separated column values and ; separated rows.

Regarding your question on the export target - this is a normal file path, so you can use any file path that is accessible on the HANA server.

Btw. depending on technical specifics of the table you're exporting and the capabilities/configuration of the HANA server containing it, export with several threads in parallel might make sense. Without laying out all details (that would be too much) you might want to play around with the number of THREADS you're using to export to speed it up. For this, you need 1 more KEYWORD --> THREADS, e.g.

EXPORT INTO '/tmp/targetfolder/targetfile.csv' from "<schema>"."<table_name>" WITH FIELD DELIMITED BY '|' RECORD DELIMITED BY  ';' THREADS 12;

Let me share the numbers of my quick and dirty example, just to give you a VERY rough idea. For a result that's useful in your context you really need to evaluate this with your table and server and fileshare.

records: ~ 72 million csv file: 19,7 GB

1 thread - 22:14 minutes 6 threads - 06:23 minutes 12 threads - 06:04 minutes

Solution 2:[2]

EXPORT INTO will allow you to export with delimiter you want. But it is limited to filepath accessible from the HANA server. May be you could use a filesharing to put the file if you have access to the OS. On the cloud SCP, i believe you can put exported file on a blob storage (like s3). It will be the fastest method i think. But you will need special priviledges For EXPORT INTO.

Another approch to boost your export with informatica would be to use the partitioning option of informatica.(if you have it)

You can then define a session with, let's say, 10 partition points. Each ETL partition will allow you to write a custom WHERE clause on the hana source.

Let's imagine you have a customer_id or à product_id in you data. You can use the first or the last digit of the customer_id in the where clause like For partition 1 : right(customer_id)=0 For partition 2 : right(customer_id)=1 ... For partition 10 : right(customer_id)=9

The idea is to split the data evenly.

The write of the csv fe could be done in concurrential mode.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Romain Ferraton