'Pyspark Fetching MongoDB records using MongoConnector and Where Clause
I'm trying to read MongoDB using this guide
df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load()
df = df.select(['my_cols'])
df = df.where('date' > '2012-12-02')
df.show()
Currently, I load all the data in memory and then apply the filter/where condition. But I want to fetch data from the schema using the where condition.
I have tried using these options
# Datafram Writer has no object where
spark.read.format("com.mongodb.spark.sql.DefaultSource").where('my_condition').load()
# Still loads all the data
spark.read.format("com.mongodb.spark.sql.DefaultSource").option('dbname', 'select * from schema where condition').load()
# Still loads all the data
spark.read.format("com.mongodb.spark.sql.DefaultSource").option('query'. 'select * from schema where condition').load()
Is it possible to achieve this behavior also what impact would be on my MongoDB in case I run a where query? Will it store the results temporarily somewhere before spark fetches them?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|