'Two New Fields Added on the Dataflow Job from a Template
I created a Dataflow job from a Template (Cloud Datastream to BigQuery) several weeks ago. I stopped the job and then tried to create a new job with the same Template (Cloud Datastream to BigQuery). Now, I see two new fields that are required (The Pub/Sub subscription being used in a GCS notification policy., Datastream output file format (avro/json)). I have no idea what I should enter into these fields. The tutorial on the page does not even work.
Any idea what the value should be entered into these 2 new fields?
I cannot find any documentation on what to enter into the new required fields. https://blog.searce.com/giving-a-spin-to-cloud-datastream-the-new-serverless-cdc-offering-on-google-cloud-114f5132d3cf
https://www.youtube.com/watch?v=7nL4UuFQKy0
Error view Tutorial Image:
New Fields that were Add Image:
Solution 1:[1]
There's a new step-by-step tutorial for setting up Datastream + Dataflow that provides all the details.
Basically, you need to:
- Set up Pub/Sub notifications on the GCS bucket - this will be used to notify Dataflow whenever Datastream writes a new file to GCS (instead of having Dataflow continuously scan GCS, which isn't scalable).
- Tell Dataflow the name of the Pub/Sub topic to receive these notifications from
For the file format, the value here would be avro
or json
, depending on the format you set up in your Datastream stream.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Etai Margolin |