'Apache Beam run docker in pipeline

The apache beam pipeline (python) I'm currently working on contains a transformation which runs a docker container.

While that works well during local testing with the DirectRunner, I'm wondering how one would deploy such a pipeline?

  • Google Dataflow isn't possible. While we can use a custom docker image, Docker-in-Docker would require the containers on Dataflow to be run with the "--privilege" flag.
  • ApacheSpark isn't possible (only tested using Google's DataProc service). The pipeline is apparently executed within a docker container.

So, what would be the easiest way to deploy such a pipeline?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source