'When is spark groupby preferred over reducebykey?

My dataset is pretty big and I would like to understand when groupby makes sense over reducebykey?



Solution 1:[1]

reduceByKey performs map side combine which reduces the amount of data sent over the network during shuffle and thereby also reduces the amount of data reduced. Where possible, use reducebyKey

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Amar Singh