'How to divide two aggreate sum dataframe

I want to divide the sum of two columns in pyspark. For example, I have a datasets like below:

    A  B  C
 1  1  2  3
 2  1  2  3
 3  1  2  3

What I want is to get sum of colA divide by sum of colB as below:

  6 (Sum of colB)  / 3 (Sum of colA) = 2

I have tried this:

sumofA = df.groupby().sum('A') 
sumofB = df.groupby().sum('B')

Result = B / A

but it produces this error:

TypeError: unsupported operand type(s) for /: 'DataFrame' and 'DataFrame'


Solution 1:[1]

Your approach was correct, but you could just do the calculation inside the aggregation function only.

from pyspark.sql import functions as F
df.groupBy().agg(F.sum("B")/F.sum("A")).show()
+-----------------+
|(sum(B) / sum(A))|
+-----------------+
|              2.0|
+-----------------+

OR, you can collect it as a value using collect()[0][0]

from pyspark.sql import functions as F
a=df.groupBy().agg(F.sum("B")/F.sum("A")).collect()[0][0]
a

Out[5]: 2.0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 murtihash