'How to pass hbase-site.xml to Google Cloud Dataflow template

We have a setup where we have a Hbase cluster running on Google cloud and using Dataflow I want to write into Hbase tables. For this, I want to pass my hbase-site.xml file in staging and then in prod, I will pass different hbase-site.xml in production environment. However, I am not able to find an option to pass a resource file to Dataflow template. Is there any option in Dataflow similar to --files in Spark or --classpath in Flink for adding this.

I can definitely add hbase-site.xml to src/main/resources which helps but I want different hbase-site.xml for two different environments. So, having an option like this would be very beneficial.



Solution 1:[1]

Are you using Beam HBaseIO and is it possible to pass these parameters as a part of the Configuration provided to it ? If so, you could probably update your template to accept this config (or values for creating a config) as a PipelineOption (and parse them in the Main class).

If you want the file to be available locally (in the VM), you probably need to setup a custom container to be used by your template.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 chamikara