'How to Get the Number of Points in Each Polygon in Scala

I have two data frames. The first contains a list of latitude and longitude points along with an ID number associated with the person who was at those coordinates and the date at which they were there. The second has the names of certain stores (roughly 1000 total) and the coordinates that outline the polygon associated with each store. I want to somehow join these two in scala (databricks) to get the counts of the number of visits to each place over a certain range of time.

I've tried simply joining the two dataframes but, because the polygon data has several points containing information only outlining the polygon, this does not work. I need the number of points (latitude and longitude) that are INSIDE each polygon.


+-------------+----------------+
|ID| latitude |longitude| date |
+-------------+----------------+
|1 |  xx      | yy      |1/1/18|
|2 |  xx      | yy      |1/2/18|
|3 |  xx      | yy      |1/1/18|
|3 |  xx      | yy      |1/3/18|
|3 |  xx      | yy      |1/1/18|
|4 |  xx      | yy      |1/5/18|
|5 |  xx      | yy      |1/5/18|
|5 |  xx      | yy      |1/5/18|
+-------------+----------------+

+-------------+-----------------------+
|location_name|polygon                |
+-------------+-----------------------+
|Location1    |POLYGON((x y, x y,...))|
|Location2    |POLYGON((x y, x y,...))|
|Location3    |POLYGON((x y, x y,...))|
|Location4    |POLYGON((x y, x y,...))|
|Location5    |POLYGON((x y, x y,...))|
|Location6    |POLYGON((x y, x y,...))|
|Location7    |POLYGON((x y, x y,...))|
|Location8    |POLYGON((x y, x y,...))|
+-------------+------------------------+


I just want to get the number of visits to each location -- the number of points from the first dataframe in each polygon from the second dataframe.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source