'Do I need to use Spark with YARN to achieve NODE LOCAL data locality with HDFS?

Do I need to use Spark with YARN to achieve NODE LOCAL data locality with HDFS?
If I use Spark standalone cluster manager and have my data distributed in HDFS cluster, how will Spark know that data is located locally on the nodes?

Solution 1:^[1]

YARN is a resource manager. It deals with memory and processes, and not with the workings of HDFS or data-locality.

Since Spark can read from HDFS sources, and the namenodes & datanodes take care of all that HDFS block data management outside of YARN, then I believe the answer is no, you don't need YARN. But you already have HDFS, which means you have Hadoop, so why not take advantage of integrating Spark into YARN?

Solution 2:^[2]

the stand alone mode has its own cluster manager/resource manager which will talk to name node for locality. client/driver will put tasks based on the results.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	OneCricketeer
Solution 2	Cobb Von

'Do I need to use Spark with YARN to achieve NODE LOCAL data locality with HDFS?

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]