'Most efficient way to match 2d coordinates in Python

I have 3 lists of 2d arrays of non-negative integers, e.g.

l1 = [(1, 7), (0, 55), (13, 3), (100, 100)]
l2 = [(5, 3), (40, 50), (11, 99), (555, 666)]
l3 = [(0, 40), (100, 555), (111, 999)]

They are not all of the same length, and I know nothing about the distribution of these points other than that all values are in some range [0, N] (N is not too big, 2500-3000-ish).

I'm looking for an efficient way to match point from l1 and l2 based on couples in l3: if there is a pair in l3 such that the 1st coordinate is also a 1st coordinate in one of the pairs in l1 AND the 2nd coordinate (in l3 pair) is a 1st coordinate in a l2 pair - than the 3 pairs "match".

For example, from the lists above we'll get

matches = [[(0,55), (40,50), (0,40)], [(100,100), (555,666), (100, 555)]]

Here's what I have so far:

l1_left_coordinates = [p[0] for p in l1]
l2_left_coordinates = [p[0] for p in l2]
for p in l3:
  left, right = p
  try:
    idx_of_left_in_l1 = l1_left_coordinates.index(left)
    idx_of_right_in_l2 = l2_left_coordinates.index(right)
  except ValueError:
    continue
  matches.append([l1[idx_of_left_in_l1], l2[idx_of_right_in_l2], p])

This works. But it means that for every l3 element I traverse both l1 and l2, resulting in O(n^2) worse-case runtime (where n is the size of the arrays). I wonder if there's a faster way to do this.



Solution 1:[1]

Sorting the list first, and only looking on the residual of the list will lead to a performance that could be approached with O(nlog(n)) here's an example, However the difference could not be noticed for small lists

import time

l1 = [(1, 7), (0, 55), (13, 3), (100, 100)]
l2 = [(5, 3), (40, 50), (11, 99), (555, 666)]
l3 = [(0, 40), (100, 555), (111, 999)]

# l1 = [(random.randint(0,1000),random.randint(0,1000)) for _ in range(3000)]
# l2 = [(random.randint(0,1000),random.randint(0,1000)) for _ in range(3000)]
# l3 = [(random.randint(0,1000),random.randint(0,1000)) for _ in range(3000)]

start = time.time()
l1 = sorted(l1,key=lambda item : item[0])     # O(nlog(n))
l2 = sorted(l2,key=lambda item : item[0])     # O(nlog(n))
l3 = sorted(l3,key=lambda item : item[0])     # O(nlog(n))

matches = []
l1_left_coordinates = [p[0] for p in l1]   # O(n)
l2_left_coordinates = [p[0] for p in l2]   # O(n)

l1_idx = 0
l2_idx = 0 

l1_id_inc = 0
l2_id_inc = 0

for x,y in l3:  
    try: 
        l1_idx = l1_left_coordinates.index(x) # (log(n)) cause we only search on the residual of the list at each iteration of the wrapping loop.
        l2_idx = l2_left_coordinates.index(y)
        
        l1_id_inc += l1_idx
        l2_id_inc += l2_idx

        l1_left_coordinates = l1_left_coordinates[l1_idx:]  # slicing the residual
        l2_left_coordinates = l2_left_coordinates[l2_idx:]
    except ValueError:
        continue

    matches.append([l1[l1_id_inc], l2[l2_id_inc], (x,y)])

elapsed = time.time()-start
print(f"took :{elapsed} seconds to run")
print(matches)

Solution 2:[2]

If you sort the lists l1 and l2 (n log n), and look for the p[0] in l1, p[1] in l2 by dichotomy (log n), it should be theorically better. However, I am not sure how fast .index() is, so maybe your solution is already fast enough

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Yehdhih ANNA
Solution 2 Qise