'How to stream data from mongodb in Structured Streaming?
Is it possible to use spark structured streaming to read data from mongo db with a readStream ?
For standard use of structured streaming, I usually do so:
val dataFrame = spark.readStream.format("parquet").option("header","true").schema(customSchema).load(path)
val query = preprocessedData.writeStream.outputMode("append").format("console").start()
query.awaitTermination()
And i know how to read data from mongo with spark :
val sparkSession = org.apache.spark.sql.SparkSession.builder
.master("local")
.appName("MongoSparkConnector")
.config("spark.mongodb.input.uri", mongodb_input_uri)
.config("spark.mongodb.output.uri", mongodb_output_uri)
.getOrCreate()
val data = sparkSession.read.format("com.mongodb.spark.sql.DefaultSource").load()
But now I want to use these two concepts to set up a structured streaming on a mongodb source, (to read only, I do not need to write in mongo)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|