'Two New Fields Added on the Dataflow Job from a Template

I created a Dataflow job from a Template (Cloud Datastream to BigQuery) several weeks ago. I stopped the job and then tried to create a new job with the same Template (Cloud Datastream to BigQuery). Now, I see two new fields that are required (The Pub/Sub subscription being used in a GCS notification policy., Datastream output file format (avro/json)). I have no idea what I should enter into these fields. The tutorial on the page does not even work.

Any idea what the value should be entered into these 2 new fields?

I cannot find any documentation on what to enter into the new required fields. https://blog.searce.com/giving-a-spin-to-cloud-datastream-the-new-serverless-cdc-offering-on-google-cloud-114f5132d3cf

https://www.youtube.com/watch?v=7nL4UuFQKy0

Error view Tutorial Image:

Error view Tutorial Image

New Fields that were Add Image:

New Fields that were Add Image



Solution 1:[1]

There's a new step-by-step tutorial for setting up Datastream + Dataflow that provides all the details.

Basically, you need to:

  1. Set up Pub/Sub notifications on the GCS bucket - this will be used to notify Dataflow whenever Datastream writes a new file to GCS (instead of having Dataflow continuously scan GCS, which isn't scalable).
  2. Tell Dataflow the name of the Pub/Sub topic to receive these notifications from

For the file format, the value here would be avro or json, depending on the format you set up in your Datastream stream.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Etai Margolin