'Merging overlapping axis-aligned rectangles

I have a set of axis aligned rectangles. When two rectangles overlap (partly or completely), they shall be merged into their common bounding box. This process works recursively.

Detecting all overlaps and using union-find to form groups, which you merge in the end will not work, because the merging of two rectangles covers a larger area and can create new overlaps. (In the figure below, after the two overlapping rectangles have been merged, a new overlap appears.)

enter image description here

As in my case the number of rectangles is moderate (say N<100), a brute force solution is usable (try all pairs and if an overlap is found, merge and restart from the beginning). Anyway I would like to reduce the complexity, which is probably O(N³) in the worst case.

Any suggestion how to improve this ?



Solution 1:[1]

I think an R-Tree will do the job here. An R-Tree indexes rectangular regions and lets you insert, delete and query (e.g, intersect queries) in O(log n) for "normal" queries in low dimensions.

The idea is to process your rectangles successively, for each rectangle you do the following:

  1. perform an intersect query on the current R-Tree (empty in the beginning)

  2. If there are results then delete the results from the R-Tree, merge the current rectangle with all result rectangles and insert the newly merged rectangle (for the last step jump to step 1.).

  3. If there are no results just insert the rectangle in the R-Tree

In total you will perform

  • O(n) intersect queries in step 1. (O(n log n))
  • O(n) insert steps in step 3. (O(n log n))
  • at most n delete and n insert steps in step 2. This is because each time you perform step 2 you decrease the number of rectangles by at least 1 (O(n log n))

In theory you should get away with O(n log n), however the merging steps in the end (with large rectangles) might have low selectivity and need more than O(log n), but depending on the data distribution this should not ruin the overall runtime.

Solution 2:[2]

Use a balanced normalized quad-tree.

Normalized: Gather all the x coordinates, sort them and replace them with the index in the sorted array. Same for the y coordinates.

Balanced: When building the quad-tree always split at the middle coordinate.

So when you get a rectangle you want to go and mark the correct nodes in the tree with some id of the rectangle. If you find any other rectangles underneath(that means they will be overlapping), gather them in a set. When done, if the vector is not empty (you found overlapping rectangles), then we create a new rectangle to represent the union of the subrectangles. If the computed rectangle is bigger then the one you just inserted, then apply the algorithm again using the new computed rectangle. Repeat this until it no longer grows, then move to the next input rectangle.

For performance every node in the quad-tree store all the rectangles overlapping that node, in addition to marking it as an end-node.

Complexity: Initial normalization is O(NlogN). Inserting and checking for overlaps will be O(log(N)^2). You need to do this for the original N rectangles and also for the overlaps. Every time you find an overlap you eliminate at least one of the original rectangles so you can find at most (N-1) overlaps. So overall you need 2N operations. So overall the complexity is going to be O(N(log(N)^2)).

This is better than other approaches because you don't need to check any-to-any rectangles for overlap.

Solution 3:[3]

This can be solved using a combination of plane sweep and spatial data structure: we merge intersecting rectangles along the sweep line and put any rectangles behind the sweep line to spatial data structure. Every time we get a newly merged rectangle we check spatial data structure for any rectangles intersecting this new rectangle and merge it if found.

If any rectangle (R) behind the sweep line intersects some rectangle (S) under sweep line then either of two corners of R nearest to the sweep line is inside S. This means that spatial data structure should store points (two points for each rectangle) and answer queries about any point lying inside a given rectangle. One obvious way to implement such data structure is segment tree where each leaf contains the rectangle with top and bottom sides at corresponding y-coordinate and each other node points to one of its descendants containing the rightmost rectangle (where its right side is nearest to the sweep line).

To use such segment tree we should compress (normalize) y-coordinates of rectangles' corners.

  1. Compress y-coordinates
  2. Sort rectangles by x-coordinate of their left sides.
  3. Move sweep line to next rectangle, if it passes right sides of some rectangles, move them to the segment tree.
  4. Check if current rectangle intersects anything along the sweep line. If not go to step 3.
  5. Check if union of rectangles found on step 4 intersects anything in the segment tree and merge recursively, then go to step 4.
  6. When step 3 reaches the end of list get all rectangles under sweep line and all rectangles in segment tree and uncompress their coordinates.

Worst-case time complexity is determined by segment tree: O(n log n). Space complexity is O(n).

Solution 4:[4]

I had a similar problem and this appears to be efficient:

  1. Sort rectangles by their x coordinates
  2. Merge overlapping rectangles
  3. Sort rectangles by their y coordinates
  4. Merge overlapping rectangles

In steps 2 and 4 you only need one pass to merge rectangles. The worst-case complexity is determined by the sorting algorithm, O(n log n).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Sorin
Solution 3 Evgeny Kluev
Solution 4 itix