'Ingest RDBMS data to BigQuery

If we have an on-prem sources like SQL-Server and Oracle. Data from it has to be ingested periodically in batch mode in Big Query. What shud be the architecture? Which GCP native services can be used for this? Can Dataflow or DataProc be used?

PS: Our organization haven't licensed any third-party ETL tool so far. Preference is for google native service. Data Fusion is very expensive.



Solution 1:[1]

There are two approaches you can take with Apache Beam.

  • Periodically run a Beam/Dataflow batch job on your database. You could use Beam's JdbcIO connector to read data. After that you can transform your data using Beam transforms (PTransforms) and write to the destination using a Beam sink. In this approach, you are responsible for handling duplicate data (for example, by providing different SQL queries across executions).

  • Use a Beam/Dataflow pipeline that can read change streams from a database. The simplest approach here might be using one of the available Dataflow templates. For example, see here. You can also develop your own pipeline using Beam's DebeziumIO connector.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 chamikara