'Get nth largest data value in subtree of BST

I'm looking to return the nth largest data value in the subtree rooted at this node in a BST that can have duplicate values.

Right now I have this but it doesn't seem to work

static int get(BSTNode node, int n) {
    
    if ( node == null || n < 0)
    return 0;
            
    if ( n == 0)
        return node.data;
    
    get(node.right, n--);           
    get(node.left, n--);

    return 0;
}

This is my Node class

private int data; // key
    private int dataCount; // no. of keys with equal value stored at this node
    private int treeSize; // no. of keys stored in the subtree rooted at this node
    private BSTNode leftChild;
    private BSTNode rightChild;


Solution 1:[1]

I have implemented an iterative solution.

Idea

The basic idea is to always go as far to the right as possible as the largest values will be there as long as a node has not yet been counted against n and save all values along the path we go in a stack. If we can't go to the right anymore and we've not counted this node against n already we've found the largest node in the tree that we have not accounted for yet. So in that case we can account for that value by getting the node from the stack, decrementing n and adding that node to the set of nodes we've already accounted for. To find the next largest value we'll have to check if the current node has a left subtree and if it has, we need to continue there. If it does not we need to go back to our parent node which we do by getting the next value from the stack. This parent node will then automatically be the next largest value.

Here an illustration of those two scenarios possible when being at the largest value that has not been accounted for.

We're at the largest node that has not been accounted for and it has no left node, so we need to go back to the parent (using the stack) which will be the next largest node.

   Parent
    /  \
   /    \
  x    largest

Alternatively: there is a left subtree, we need to examine first.

   Parent
    /  \
   /    \
  x    largest
         /
        /
  left subtree
     /  \
    /    \

Implementation

/**
 * Finds the n-th largest key in a given subtree.
 * @param root root node to start from
 * @param n n-th largest key to get, n =< 1 will result in the largest key being returned
 * @return returns n-th larges value, if n <= 1 return largest value
 */
public Node findNthLargestKey(Node root, int n){
    if(root == null) return null;
    if(n < 1) n = 1;
    // early out: if you know how many nodes you have in the tree return if n is bigger than that
    // based on number of nodes compared to n some assumptions could be made to speed up algorithm
    // e.g. n == number of nodes -> get to left-most node and return it
    // if n is close to number of nodes also probably start in left branch instead of all the way on the right side at the biggest node
    var stack = new Stack<Node>();
    // remember nodes that have been visited and have been counted against n
    var done = new HashSet<Integer>();
    // start at root node
    stack.add(root);
    // continue as long as the stack is not empty, if it is n was to big and the n-th largest value could not be found
    while (!stack.empty()){
        // pop next value from the stack (will be root in first iteration)
        current = stack.pop();
        // always try to go as far to the right as possible to get the biggest value that has not yet been counted against n
        while (current != null && !done.contains(current.getKey())){
            stack.add(current);
            current = current.getRight();
        }
        // if we are here we've found the biggest value that has not yet been counted against n
        var prev = stack.pop();
        // if we have found the nth biggest value return
        if(--n == 0){
            return prev;
        }
        // otherwise mark this node as done and counted against n
        done.add(prev.getKey());
        // if node has a left successor, visit it first as this node has no right successors that have not been counted against n already
        if(prev.getLeft() != null) stack.add(prev.getLeft());
    }
    // n-th largest value was not found (n was too big)
    return null;
}

My Node looks like this with getters and setters defined of course. But the implementation will also work for your node, as the number of nodes with same value are irrelevant to find the n-th largest node. And even if they were not , the same algorithm would work but then you would need to decrement by the number of nodes with same value and the condition would need to be adjusted to n <= 0 to return.

public class Node {

    private int key;
    private Node right;
    private Node left;
    private Object anyData;

    public Node(int key) {
        this(key, null);
    }

    public Node(int key, Object anyData) {
        this.key = key;
        this.anyData = anyData;
        this.left = null;
        this.right = null;
    }
}

Test

I've tested my implementation against random trees and the results have always been correct. This Test class however only checks results for the root node to be able to test the method for every node in the tree. I've additionally also run some test where n > number of nodes in tree which always has to return null for not found and for smaller subtrees.

public class Test {

    public static void main(String[] args){
        // values to insert into the tree
        int[] curVals = fillArrayRand(20, 1, 200);

        // Create tree
        BinarySearchTree tree = new BinarySearchTree();
        System.out.println("Tree has been created.");

        // fill tree
        for (var cur: curVals) {
            tree.insertIter(new Node(cur));
        }
        // print values in order of insertion, first value will be the root value
        System.out.println("Tree was filled with the following values: %s".formatted(Arrays.toString(curVals)));
        // print tree in using inorder traversal
        tree.printRec(Traversal.INORDER);

        var sorted = Arrays.stream(curVals).sorted().distinct().toArray();
        // always start at root node; which is the first node that gets inserted
        // find() returns a given node by key
        var startNode = tree.find(curVals[0]);

        // now loop over sorted values (sorted in ascending order -> nth largest is at position n - i in the sorted array)
        for (int i = 0; i < sorted.length; i++) {
            var result = tree.findNthLargestKey(startNode, sorted.length - i);

            // if key in i-th position of sorted array is the same as the key of result => success
            // if result is null, the node was not found (should not happen here as sorted.length - i is never > sorted.length)
            System.out.printf("#%d largest value:\t%d (expected)\t-\t%s (result)\t", sorted.length - i, sorted[i], result == null ? "not found": result.getKey());
            if (result != null && sorted[i] == result.getKey()) {
                System.out.println("SUCCESS");
            } else System.out.println("FAILED");
        }
    }

    public static int[] fillArrayRand(int size, int randStart, int randEnd){
        int[] randArray = new int[size];
        for(int i = 0; i < size; i++){
            randArray[i] = (int)( (randEnd - randStart) * Math.random() + randStart);
        }
        return randArray;
    }
}

Expected output

Tree has been created.
Tree was filled with the following values: [148, 65, 18, 168, 8, 148, 194, 186, 114, 22, 102, 51, 123, 169, 68, 118, 37, 18, 26, 18]
((((n,8,n),18,(n,22,(((n,26,n),37,n),51,n))),65,(((n,68,n),102,n),114,((n,118,n),123,n))),148,(n,168,(((n,169,n),186,n),194,n)))
#17 largest value:  8 (expected)    -   8 (result)  SUCCESS
#16 largest value:  18 (expected)   -   18 (result) SUCCESS
#15 largest value:  22 (expected)   -   22 (result) SUCCESS
#14 largest value:  26 (expected)   -   26 (result) SUCCESS
#13 largest value:  37 (expected)   -   37 (result) SUCCESS
#12 largest value:  51 (expected)   -   51 (result) SUCCESS
#11 largest value:  65 (expected)   -   65 (result) SUCCESS
#10 largest value:  68 (expected)   -   68 (result) SUCCESS
#9 largest value:   102 (expected)  -   102 (result)    SUCCESS
#8 largest value:   114 (expected)  -   114 (result)    SUCCESS
#7 largest value:   118 (expected)  -   118 (result)    SUCCESS
#6 largest value:   123 (expected)  -   123 (result)    SUCCESS
#5 largest value:   148 (expected)  -   148 (result)    SUCCESS
#4 largest value:   168 (expected)  -   168 (result)    SUCCESS
#3 largest value:   169 (expected)  -   169 (result)    SUCCESS
#2 largest value:   186 (expected)  -   186 (result)    SUCCESS
#1 largest value:   194 (expected)  -   194 (result)    SUCCESS

Note: the output of the line with all the parenthesis is the output of the inorder traversal where (left node, parent, right node) and n means null for i. e. no node. The first node that gets inserted is the root node, so it's best to start to read the output from there.

Correctness

I should be possible using a loop invariant and induction to proof the algorithm is correct and produces the expected result for every correct binary search tree and input. The loop variant (informally) would be after an iteration i of the outer while loop, we have found the i-th largest node in the tree for 1 <= i <= n. I have not done that here, but using the idea that should be straightforward.

Effectiveness

I have not done a complexity analysis but it is obvious that the best case e.g. root node is largest value and we want the largest value the complexity is O(1). In worst case it will be O(n) no matter which node we search for. The algorithm could certainly be improved for some inputs i. e. if we have n close to the number of nodes in the tree, meaning we are searching for a small value. In that case it will be faster to start from the left-most node which is the smallest and search for the (number of nodes - n)-th smallest value. If you were to implement something like this you could certainly greatly improve the average case runtime.

Solution 2:[2]

When dealing with recursive data structures, you can often expect some recursive code to work on them. This case is not an exception either, just the recursive parts will need some dirtiness.

Let's use the 3-element BSTs for designing, labeled with insertion-order:

123    132    213,231  312    321

1       1        2      3        3
 \       \      / \    /        /
  2       3    1   3  1        2
   \     /             \      /
    3   2               2    1

Finding the largest element is easy:

  1. just go to the right as long as you can

You will bump into 3, no matter what level it is.

Finding the second largest element is the revealing part:

  1. going to the right as long as it's possible still seems to be a good start
  2. then we look at where we are:
    • if it's a leaf node, return to the parent, and that will be the one (123,213,231 cases)
    • if there's a left-child, check that one, but as the 312 case in particular shows, "checking" the left-child actually means step 1, so again go to the right as long as it's possible.

The recursion is somewhat found, and these really are the steps we need for larger cases too. It's also somewhat seen that when we are going to the right, the "nth-ness" of the next number we will check doesn't change. It starts changing only when we are stepping to the left (132,312,321), or returning to a previous level (123,213,231).

The dark part is that we have to track this counter somehow. We found the answer when it reaches 0 (so starting this algorithm with n=0 finds the largest element), and after that (when n goes negative) it will just return the value it got from recursion.

First here is a JavaScript PoC, using a bit dirty hacks, like if a member variable doesn't exist at all yet, it still can be checked (the if(this.left) things), and the counter is a one-element array (so it can be modified across the recursive calls), the method is called as root.nthback([i]), where the [i] is an array literal. Also, the method doesn't bother returning anything when the element doesn't exist, that produces the two undefineds at the end of the test output. These shortcuts will be addressed in the Java variant at the end of this post.
The example input was just taken from the other answer, on top of their availability, they have some repeats too.

const init = [148, 65, 18, 168, 8, 148, 194, 186, 114, 22, 102, 51, 123, 169, 68, 118, 37, 18, 26, 18];

class Node {
  nthback(n) {
    if (this.right) {
      let res = this.right.nthback(n);
      if (n[0] < 0)
        return res;
    }
    if (n[0]-- === 0)
      return this.value;
    if (this.left) {
      let res = this.left.nthback(n);
      if (n[0] < 0)
        return res;
    }
  }
  constructor() {
    this.value = NaN;
  }
  add(value) {
    if (isNaN(this.value)) {
      this.value = value;
    } else if (value < this.value) {
      if (!this.left)
        this.left = new Node;
      this.left.add(value);
    } else {
      if (!this.right)
        this.right = new Node;
      this.right.add(value);
    }
  }
  walk() {
    let result = "";
    if (this.left)
      result = this.left.walk() + ",";
    result += this.value;
    if (this.right)
      result += "," + this.right.walk();
    return result;
  }
}
const root = new Node;
for (const value of init)
  root.add(value);
console.log(root.walk());
for (let i = 0; i < 22; i++)
  console.log(root.nthback([i]));

So the actual magic is quite short and also symmetric:

  nthback(n) {
    if (this.right) {                            // 1
      const res = this.right.nthback(n);         // 2
      if (n[0] < 0)                              // 3
        return res;
    }
    if (n[0]-- === 0)                            // 4
      return this.value;
    if (this.left) {                             // 5
      const res = this.left.nthback(n);
      if (n[0] < 0)
        return res;
    }
  }

If there is something on the right (1), it has to be checked (2), and if the counter is negative afterwards (3), the result we got back is the actual result of the entire call, so we pass it back.
If we are still in the method, (4) is where we check if the counter is exactly 0, because then this node has the actual result, which we can return. It's worth to remember that the n[0]-- part decrements the counter regardless of the outcome of the comparison. So if n[0] was 0 initially, it will become -1 and we return this.value;. If it was something else, it just gets decremented.
(5) does the (1)-(2)-(3) part for the left branch. And the hidden JavaScript thing is that we don't have to return anything. But the (4)-(5) parts will change when using your complete structure anyway.


Adding subtree-size allows early return if the incoming counter is simply larger than the size of the entire subtree: we just decrement the counter by the size, and return, the element is somewhere else. And this also means that when we don't return, the result is in the subtree, so we check the possible right branch, then ourselves, and if we are still inside the method, we don't even have to check if we have a left branch, because we do have it for sure, and it does contain the result, also for sure. So (5) will simply become a direct return this.left.nthback(n);. Which is quite a simplification.

Tracking multiplicity affects (4): instead of checking for 0, we will have to check if the counter is less than the multiplicity, and also, instead of decrementing the counter by 1, we have to subtract the actual multiplicity from it.

const init = [148, 65, 18, 168, 8, 148, 194, 186, 114, 22, 102, 51, 123, 169, 68, 118, 37, 18, 26, 18];

class Node {
  nthback(n) {
    if (this.size <= n[0]) {
      n[0] -= this.size;
      return 0;
    }
    if (this.right) {
      let res = this.right.nthback(n);
      if (n[0] < 0)
        return res;
    }
    if (n[0] < this.count) {
      n[0] -= this.count;
      return this.value;
    }
    n[0] -= this.count;
    return this.left.nthback(n);
  }
  constructor() {
    this.value = NaN;
    this.size = 0;
  }
  add(value) {
    this.size++;
    if (isNaN(this.value)) {
      this.value = value;
      this.count = 1;
    } else if (value === this.value) {
      this.count++;
    } else if (value < this.value) {
      if (!this.left)
        this.left = new Node;
      this.left.add(value);
    } else {
      if (!this.right)
        this.right = new Node;
      this.right.add(value);
    }
  }
  walk() {
    let result = "";
    if (this.left)
      result = this.left.walk() + ",";
    result += this.value;
    if (this.count > 1)
      result += "x" + this.count;
    result += "(" + this.size + ")";
    if (this.right)
      result += "," + this.right.walk();
    return result;
  }
}
const root = new Node;
for (const value of init)
  root.add(value);
console.log(root.walk());
for (let i = 0; i < 22; i++)
  console.log(root.nthback([i]));

So the final JavaScript variant could look like this:

  nthback(n) {
    if (this.size <= n[0]) {      // 1
      n[0] -= this.size;
      return 0;
    }
    if (this.right) {             // 2
      let res = this.right.nthback(n);
      if (n[0] < 0)
        return res;
    }
    if (n[0] < this.count) {      // 3
      n[0] -= this.count;         // 4
      return this.value;
    }
    n[0] -= this.count;           // 4
    return this.left.nthback(n);  // 5
  }

Subtree-skipping happens in (1), by comparing subtree-size, and the target count we can immediately tell if the result is in this subtree or somewhere else. While JavaScript would allow a simple return; here, a 0 is produced instead, as it seems to be desired in the question.
The next step (2) is unchanged from the previous variant, looks exactly same, does exactly same.
(3) had to be taken apart. While the post-decrement operator was helpful previously, there's no post--= operator, so it happens for both outcomes of the comparison (4). Also, we don't compare === 0 as before, but we compare < this.count instead. By the way, we could have done that in the previous case too if we really wanted to, there this.count was always 1, so < 1 could have done the thing (the counter is never negative at this point, that can only happen in the left/right returns). In fact if we really-really wanted to, we could do the subtraction prior to the comparison, and instead of the current 0<=n fact and n<count check, we could shift the comparison downwards, knowing that now -count<=n and checking for n<0.
(5) became the one-liner as promised. If we reach this point, the result is unconditionally inside the left-branch, which unconditionally exists.


Making it Java

The simplest part is the array-trick: there could be a public method to call, accepting an actual number, and then it could wrap it into an array, and call a private method doing the actual job. Also changed the variable names to the ones you have:

public int nthback(int n) {
    return nthback(new int[] { n });
}

private int nthback(int[] n) {
    if (treeSize <= n[0]) {
        n[0] -= treeSize;
        return 0;
    }
    if (rightChild != null) {
        int res = rightChild.nthback(n);
        if (n[0] < 0)
            return res;
    }
    if (n[0] < dataCount) {
        n[0] -= dataCount;
        return data;
    }
    n[0] -= dataCount;
    return leftChild.nthback(n);
}

As you have private members, these methods have to reside in the same source file, and then they could just reside directly inside class BSTNode anyway.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2