A Window: 01/11/2020

Wednesday, 25 November 2020

Smallest missing positive integer in an unsorted array

The question is to find the smallest missing positive integer, given an unsorted integer array. Consider the following table of arrays and their corresponding outputs for examples.

Input array	Output value
1,2,3,3	4
3,4,-1,1	2
7,8,9,11,12	1
-1,-2	1

The simple solution is to sort the array and get the missing value. The challenge is to do it in O(n) and with constant space. Modifications to the array are allowed.

The complexity requirement indicates that we need an approach with multiple passes on the same array. Since modifications are allowed, we need a way of storing information within the array. The smallest output of the function is 1 and the largest output of the function will obviously be the length of the array + 1. So, we can see the possibility of some 1 to 1 mapping between the indices and the possible outputs of the function. The idea is if the value at an index can be modified to store information about whether the value (index + 1) is present in the array, we could use that information to reach the smallest index whose corresponding value , i.e. (index + 1), is not present in the array. That would be the answer we are looking for.

As seen from the input set, the numbers are signed and we know that sign bit is not part of the value of the number so we have to utilize that to store the availability information. As the availability information is boolean, the space required and available match.

Negative values in the array do not change the output at all and if all elements are negative we can just return 1 as the value we are looking for. So, we just need to focus on how we can use positive integers. So, we store information by marking the positive integers as negative to mark presence. We can map a positive integer to the index of its value - 1.

Let us consider 3 in the second array above. The presence of 3 will be marked by marking the value at index 2 (= 3 - 1) as negative. We have a problem here though. The value there is already negative. So, we need a way of knowing whether a value was already negative or we marked it so.

To overcome that problem, we modify the array a bit. We can move all negative values to the end of the array and store the index distinction. Now, while marking if we want to mark a value that is already negative, we don't care because that value does not contribute to the output anyway. From the remaining positive ones, whichever remains positive after this pass gives the indication of the result, i.e. the index of the value + 1.

public class SmallestMissingPositive {
    public static void main(String[] args) {
        System.out.println(leastPositive(new int[]{1,2,3,3}));
        System.out.println(leastPositive(new int[]{3,4,-1,1}));
        System.out.println(leastPositive(new int[]{7,8,9,11,12}));
        System.out.println(leastPositive(new int[]{-1,-2}));
        System.out.println(leastPositive(new int[]{1, -1,-2}));
    }

    private static int leastPositive(int[] nums){
        int beginNegative = nums.length -1;

        // sequence of negatives at the end
        for(int i = nums.length -1; i >= 0; i--){
            if(nums[i] < 0){
                beginNegative = i;
            }
            else{
                break;
            }
        }

        // all negative values in nums
        if(beginNegative == 0){
            return 1;
        }


        // can transfer to beginNegative index now
        beginNegative--;
        for(int i = 0; i < nums.length; i++){
            if(i == beginNegative){
                break;
            }
            else{
                if(nums[i] <0){
                    int temp = nums[i];
                    nums[i] = nums[beginNegative];
                    nums[beginNegative] = temp;

                    // swap

                    beginNegative--;
                    if(beginNegative ==i){
                        break;
                    }
                    // now we search for next index for beginNegative
                    for(int j = beginNegative; j > i; j--){
                        if(nums[j] < 0){
                            beginNegative = j;
                        }
                        else{
                            break;
                        }
                    }
                }
            }
        }


        for(int i = 0; i <= beginNegative; i++ ){
            int idx = Math.abs(nums[i]) - 1;
            if (idx >= nums.length){
                continue;
            }
            if (nums[idx] > 0){
                nums[idx] = nums[idx] * -1;
            }
        }
        for (int i = 0; i <= beginNegative; i++) {
            if (nums[i] > 0){
                return i+1;
            }
        }
        return beginNegative + 2;
    }

}

Tuesday, 24 November 2020

Intersection of two sorted arrays

In an interview, I was asked to find the intersection of two sorted arrays. My initial approach was to iterate over each element in first array and use binary search for it in the second array and handle the second array similarly. This solution has complexity of O(N * lg N). It also has one problem that when an element is common, it is found twice: when searched in second from iteration over first and vice versa. This can be easily avoided by searching in the output.

A simpler solution is to just use two pointers.

List<Integer> intersection(int[] a, int[] b){
    List<Integer> result = new ArrayList<>();
    
    int i = 0;
    int j = 0;
    for(; i< a.length &&  j < b.length;){
      if(a[i] == b[j]){
        result.add(a[i]);
        i++;
        j++;
      }else if (a[i] < b[j]){
        i++;        
       }else {
        j++;
       }
    }
    return result;
  }

This solution works in linear time. The solution can be further improved for the average case keeping the worst case complexity same.

Consider a large array as follows:

a = [10,11,12,13,14,15,16,17,...10000,1000000]

Now let's consider the case of finding intersection with a small array as follows:

b = [10000,1000000]

After we initialize the two pointers to the beginning of the arrays, we keep moving in the first array for a long time. We can improve this by jumping to the next element higher or equal to the element being compared it. We could use binary search for that.

List<Integer> intersection(int[] a, int[] b){
    List<Integer> result = new ArrayList<>();
    
    int i = 0;
    int j = 0;
    for(; i< a.length &&  j < b.length;){
      if(a[i] == b[j]){
        result.add(a[i]);
        i++;
        j++;
      }else if (a[i] < b[j]){
        //i++;
        int low = i + 1;
        int high = a.length - 1;
        int mid = low + (high - low) / 2;
        while(low < high){
          if(a[mid] == b[j]){
            i = mid;
            break;
          }else if(a[mid] > b[j]){
            high = mid - 1;
          }else{
            low = mid + 1;
          }
          mid = low + (high - low) / 2;
        }
        if (low >= high){
          i = low;
        }
        
       }else {
        //j++;
        
        int low = j + 1;
        int high = b.length - 1;
        int mid = low + (high - low) / 2;
        while(low < high){
          if(b[mid] == a[i]){
            j = mid;
            break;
          }else if(b[mid] > a[i]){
            high = mid - 1;
          }else{
            low = mid + 1;
          }
          mid = low + (high - low) / 2;
        }
        if (low >= high){
          j = low;
        }
      }
    }
    return result;
  }

This improves the average case performance, especially for skewed inputs as mentioned earlier.

Saturday, 21 November 2020

Elasticsearch shard allocation failure

Recently, I found a shard allocation failure in an Elasticsearch cluster. Checking the reason for the failure, I found the reason to be the following:

"Validation Failed: 1: this action would add [4] total shards, but this cluster currently has [4000]/[4000] maximum shards open;"

It is a bit unclear from the error message why the cluster considers 4000 as the maximum number of shards it should consider. Especially, given that on a different cluster I get 16000 as the maximum number of shards.

Looking at my cluster settings (including defaults), I found a property called "max_shards_per_node" and it was set to "1000". Since, I had 4 nodes, it added up perfectly.

Generally, this setting is put in so that the nodes have a proper balance between different types of work loads. However, we knew our scenarios of more shards was temporary so increasing the value to 1500 worked for the short term.

A Window