'HDFS Date partition directory loop
I have a HDFS Directory as below.
/user/staging/app_name/2022_05_06
Under such a directory I have around 1000 part files.
I want to loop each of the part file and start loading them into cassandra,the volume of the entire directory is around 50 Billion.
This is very huge to process in a single shot,hence the idea was to read the individual part files and start loading them one by one in Append Mode.
Can anyone help in the approach?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|