|

Sparse table

20 April 2019

Sparse table is a data structure that allows answering queries of finding minimum or maximum in a range in an array in \(O(1)\) time after \(O(nlogn)\) preprocessing.

Structure of the sparse table

Suppose you have the following array of numbers \(a\):

And you need to answer the queries of finding the minimum in some range in the array. For example, the minimum in the range \([a_1\), \(a_3]\) is 2; minimum in the range \([a_5\), \(a_8]\) is 3:

For an array of size \(n\), we can easily find the minimum in any array range with a simple loop in \(O(n)\) time. Can we do better?

Let's preprocess the array in the following manner. First, we find the minimum in each array subrange of length 2:

Then we find the minimum in each array subrange of length 4:

Minimum on each possible subarray of length 4.

And so on for each power of two smaller than or equal to the length of the array. In our example array of length 9, we will find the minimum in each subrange of length 1 (which is just each number itself), 2, 4, and 8.

We store the results in the two dimensional array \(table[][]\). \(table[i][j]\) means the minimum in the subrange of length \(2^i\) starting from the position \(j\). Here are some values of \(table[i][j]\) in our example array \(a\):

\(table[0][3] = min(a[3] \dots a[3]) = 9\) (minimum in the range of length \(2^0\) starting from position 3).
\(table[1][5] = min(a[5] \dots a[6]) = 5\) (minimum in the range of length \(2^1\) starting from position 5).
\(table[2][0] = min(a[0] \dots a[3]) = 1\) (minimum in the range of length \(2^2\) starting from position 0).

How do we compute this efficiently? First, if our input array length is \(n\), we need to know what is the largest power of two \(p\) such that \(2^p \le n\). In fact, for future use, let's precompute this for every number up to \(n\):

int n; // length of our array
int logs[MAXN]; // logs[i] means such maximum p that 2^p <= i

void computeLogs() {
  logs[1] = 0;
  for (int i = 2; i <= n; i++) {
    logs[i] = logs[i / 2] + 1;
  }
}

Let's build the sparse table itself. We will build it in an increasing order of powers of two. Now notice that when computing \(table[i][j]\), we deal with the range of length \(2^i\), and we can reuse the computation that we did for the range of length \(2^{i-1}\) before:

$$table[i][j] = min\begin{cases} table[i - 1][j] \\ table[i - 1][j + 2^{i - 1}]) \end{cases}$$

For example, in our array \(a\):

Example of reusing the previously computed values. Notice how \(table[2][3] = min(table[1][3], table[1][5])\).

Using this observation, we can efficiently build the whole sparse table with the following code:

const int MAXN = 105000;
const int MAXLOG = 20;

int n;
int a[MAXN];
int table[MAXLOG][MAXN];

void buildSparseTable() {
  for (int i = 0; i <= logs[n]; i++) {
    int curLen = 1 << i; // 2^i
    for (int j = 0; j + curLen <= n; j++) {
      if (curLen == 1) {
        table[i][j] = a[j];
      } else {
        table[i][j] = min(table[i - 1][j], table[i - 1][j + (curLen / 2)]);
      }
    }
  }
}

Let's assess the time complexity of building the sparse table. For an array of length \(n\), there are \(\left\lfloor\log(n)\right\rfloor + 1\) powers of two smaller than or equal to \(n\). For each of them, we can assume we are processing the array in \(O(n)\) time to find the minimum in all subranges, which gives us a total time complexity of \(O(nlogn)\) for building the sparse table.

We can also do a more rigorous assessment. For each power of two \(p\) such that \(2^p \le n \), we have \(n - 2^p + 1\) entries in the sparse table. Each entry is computed in \(O(1)\) time, so to assess the total runtime we can calculate the total number of entries. This can be approximated as follows:

$$\sum_{p=0}^{\left\lfloor\log(n)\right\rfloor + 1} (n-2^p+1) \\ \approx nlogn - \sum_{p=0}^{\left\lfloor\log(n)\right\rfloor + 1} 2^p \\ \approx nlogn - (1 + 2 + \dots + 2^{logn}) \\ \approx nlogn - 2 \times n = O(nlogn).$$

Queries

Now how do we answer the range minimum queries once we have built the sparse table? Let's say we need to find the minimum in the interval of \([l, r]\):

Example with \(l = 1\) and \(r = 6\).
(\(len =6\), \(p = 2\) and \(2^p = 4\)).

Let's denote the length of the interval as \(len = r - l + 1\). Using the array \(logs[]\) that we have computed before, we know that the maximum power of two that fits into the interval is \(p = logs[len]\), and \(2^p \le len\).

Notice that we can fit \(2^p\) in \(len\) only once – otherwise it wouldn't be the maximum power of two. In other words, \(2^p*2 \gt len\).

Using this observation, to find the minimum in any range \([l, r]\), it is enough to examine the subrange of length \(2^p\) that starts in \(l\) and the subrange of length \(2^p\) that ends in \(r\). Because \(2^p*2 \gt len\), we know that these two subranges will cover the whole range \([l, r]\).

To get the minimum on \([l=1,r=6]\), it is enough to examine minimums on \([1,4]\) and \([3,6]\).

We can express this in the following laconic code:

int getMin(int l, int r) {
  int p = logs[r - l + 1];
  int pLen = 1 << p; // 2^p
  return min(table[p][l], table[p][r - pLen + 1]);
}

It's easy to see that after the precomputation, we can answer a query of getting the minimum in any range in \(O(1)\) time.

Applications

Sparse table is handy when you need to answer a lot of queries of finding the minimum or the maximum in a range in an array. After the precomputation, you answer the queries essentially for free, with no extra work.

The main drawback of the sparse table is that the input array cannot change. Any change will require the new precomputation to restore the data structure. If you need to answer the minimum or maximum range queries in a changing array, you would be better using, for example, a segment tree.

Finally, so far we were only dealing with finding the minimums or maximums. In fact, sparse table can be applied to answer the range queries of any idempotent functions – that is, functions that can be applied multiple times without changing the result beyond the initial application. Examples of idempotent functions are minimum, maximum, GCD, and LCM functions, among many others.

Code and practice problems

The full code for the sparse table is available on GitHub here. You can also practice the algorithm on the following problems: