'Pyspark Fetching MongoDB records using MongoConnector and Where Clause

I'm trying to read MongoDB using this guide

df = spark.read.format("com.mongodb.spark.sql.DefaultSource").load()
df = df.select(['my_cols'])
df = df.where('date' > '2012-12-02')
df.show()

Currently, I load all the data in memory and then apply the filter/where condition. But I want to fetch data from the schema using the where condition.

I have tried using these options

# Datafram Writer has no object where
spark.read.format("com.mongodb.spark.sql.DefaultSource").where('my_condition').load()

# Still loads all the data 
spark.read.format("com.mongodb.spark.sql.DefaultSource").option('dbname', 'select * from schema where condition').load()

# Still loads all the data
spark.read.format("com.mongodb.spark.sql.DefaultSource").option('query'. 'select * from schema where condition').load()

Is it possible to achieve this behavior also what impact would be on my MongoDB in case I run a where query? Will it store the results temporarily somewhere before spark fetches them?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source