'Most efficient way to match 2d coordinates in Python
I have 3 lists of 2d arrays of non-negative integers, e.g.
l1 = [(1, 7), (0, 55), (13, 3), (100, 100)]
l2 = [(5, 3), (40, 50), (11, 99), (555, 666)]
l3 = [(0, 40), (100, 555), (111, 999)]
They are not all of the same length, and I know nothing about the distribution of these points other than that all values are in some range [0, N] (N is not too big, 2500-3000-ish).
I'm looking for an efficient way to match point from l1
and l2
based on couples in l3
:
if there is a pair in l3
such that the 1st coordinate is also a 1st coordinate in one of the pairs in l1
AND the 2nd coordinate (in l3
pair) is a 1st coordinate in a l2
pair - than the 3 pairs "match".
For example, from the lists above we'll get
matches = [[(0,55), (40,50), (0,40)], [(100,100), (555,666), (100, 555)]]
Here's what I have so far:
l1_left_coordinates = [p[0] for p in l1]
l2_left_coordinates = [p[0] for p in l2]
for p in l3:
left, right = p
try:
idx_of_left_in_l1 = l1_left_coordinates.index(left)
idx_of_right_in_l2 = l2_left_coordinates.index(right)
except ValueError:
continue
matches.append([l1[idx_of_left_in_l1], l2[idx_of_right_in_l2], p])
This works. But it means that for every l3
element I traverse both l1
and l2
, resulting in O(n^2)
worse-case runtime (where n
is the size of the arrays). I wonder if there's a faster way to do this.
Solution 1:[1]
Sorting the list first, and only looking on the residual of the list will lead to a performance that could be approached with O(nlog(n))
here's an example,
However the difference could not be noticed for small lists
import time
l1 = [(1, 7), (0, 55), (13, 3), (100, 100)]
l2 = [(5, 3), (40, 50), (11, 99), (555, 666)]
l3 = [(0, 40), (100, 555), (111, 999)]
# l1 = [(random.randint(0,1000),random.randint(0,1000)) for _ in range(3000)]
# l2 = [(random.randint(0,1000),random.randint(0,1000)) for _ in range(3000)]
# l3 = [(random.randint(0,1000),random.randint(0,1000)) for _ in range(3000)]
start = time.time()
l1 = sorted(l1,key=lambda item : item[0]) # O(nlog(n))
l2 = sorted(l2,key=lambda item : item[0]) # O(nlog(n))
l3 = sorted(l3,key=lambda item : item[0]) # O(nlog(n))
matches = []
l1_left_coordinates = [p[0] for p in l1] # O(n)
l2_left_coordinates = [p[0] for p in l2] # O(n)
l1_idx = 0
l2_idx = 0
l1_id_inc = 0
l2_id_inc = 0
for x,y in l3:
try:
l1_idx = l1_left_coordinates.index(x) # (log(n)) cause we only search on the residual of the list at each iteration of the wrapping loop.
l2_idx = l2_left_coordinates.index(y)
l1_id_inc += l1_idx
l2_id_inc += l2_idx
l1_left_coordinates = l1_left_coordinates[l1_idx:] # slicing the residual
l2_left_coordinates = l2_left_coordinates[l2_idx:]
except ValueError:
continue
matches.append([l1[l1_id_inc], l2[l2_id_inc], (x,y)])
elapsed = time.time()-start
print(f"took :{elapsed} seconds to run")
print(matches)
Solution 2:[2]
If you sort the lists l1 and l2 (n log n), and look for the p[0] in l1, p[1] in l2 by dichotomy (log n), it should be theorically better. However, I am not sure how fast .index() is, so maybe your solution is already fast enough
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Yehdhih ANNA |
Solution 2 | Qise |