Wednesday, 25 November 2020

Smallest missing positive integer in an unsorted array

The question is to find the smallest missing positive integer, given an unsorted integer array. Consider the following table of arrays and their corresponding outputs for examples.

Input array Output value
1,2,3,3 4
3,4,-1,1 2
7,8,9,11,12 1
-1,-2 1

The simple solution is to sort the array and get the missing value. The challenge is to do it in O(n) and with constant space. Modifications to the array are allowed.

The complexity requirement indicates that we need an approach with multiple passes on the same array. Since modifications are allowed, we need a way of storing information within the array. The smallest output of the function is 1 and the largest output of the function will obviously be the length of the array + 1. So, we can see the possibility of some 1 to 1 mapping between the indices and the possible outputs of the function. The idea is if the value at an index can be modified to store information about whether the value (index + 1) is present in the array, we could use that information to reach the smallest index whose corresponding value , i.e. (index + 1), is not present in the array. That would be the answer we are looking for.

As seen from the input set, the numbers are signed and we know that sign bit is not part of the value of the number so we have to utilize that to store the availability information. As the availability information is boolean, the space required and available match.

Negative values in the array do not change the output at all and if all elements are negative we can just return 1 as the value we are looking for. So, we just need to focus on how we can use positive integers. So, we store information by marking the positive integers as negative to mark presence. We can map a positive integer to the index of its value - 1.

Let us consider 3 in the second array above. The presence of 3 will be marked by marking the value at index 2 (= 3 - 1) as negative. We have a problem here though. The value there is already negative. So, we need a way of knowing whether a value was already negative or we marked it so.

To overcome that problem, we modify the array a bit. We can move all negative values to the end of the array and store the index distinction. Now, while marking if we want to mark a value that is already negative, we don't care because that value does not contribute to the output anyway. From the remaining positive ones, whichever remains positive after this pass gives the indication of the result, i.e. the index of the value + 1.

public class SmallestMissingPositive {
public static void main(String[] args) {
System.out.println(leastPositive(new int[]{1,2,3,3}));
System.out.println(leastPositive(new int[]{3,4,-1,1}));
System.out.println(leastPositive(new int[]{7,8,9,11,12}));
System.out.println(leastPositive(new int[]{-1,-2}));
System.out.println(leastPositive(new int[]{1, -1,-2}));
}

private static int leastPositive(int[] nums){
int beginNegative = nums.length -1;

// sequence of negatives at the end
for(int i = nums.length -1; i >= 0; i--){
if(nums[i] < 0){
beginNegative = i;
}
else{
break;
}
}

// all negative values in nums
if(beginNegative == 0){
return 1;
}


// can transfer to beginNegative index now
beginNegative--;
for(int i = 0; i < nums.length; i++){
if(i == beginNegative){
break;
}
else{
if(nums[i] <0){
int temp = nums[i];
nums[i] = nums[beginNegative];
nums[beginNegative] = temp;

// swap

beginNegative--;
if(beginNegative ==i){
break;
}
// now we search for next index for beginNegative
for(int j = beginNegative; j > i; j--){
if(nums[j] < 0){
beginNegative = j;
}
else{
break;
}
}
}
}
}


for(int i = 0; i <= beginNegative; i++ ){
int idx = Math.abs(nums[i]) - 1;
if (idx >= nums.length){
continue;
}
if (nums[idx] > 0){
nums[idx] = nums[idx] * -1;
}
}
for (int i = 0; i <= beginNegative; i++) {
if (nums[i] > 0){
return i+1;
}
}
return beginNegative + 2;
}

}

Tuesday, 24 November 2020

Intersection of two sorted arrays

 In an interview, I was asked to find the intersection of two sorted arrays. My initial approach was to iterate over each element in first array and use binary search for it in the second array and handle the second array similarly. This solution has complexity of O(N * lg N). It also has one problem that when an element is common, it is found twice: when searched in second from iteration over first and vice versa. This can be easily avoided by searching in the output.

A simpler solution is to just use two pointers.

List<Integer> intersection(int[] a, int[] b){
    List<Integer> result = new ArrayList<>();
    
    int i = 0;
    int j = 0;
    for(; i< a.length &&  j < b.length;){
      if(a[i] == b[j]){
        result.add(a[i]);
        i++;
        j++;
      }else if (a[i] < b[j]){
        i++;       
       }else {
        j++;
       }
    }
    return result;
  }

This solution works in linear time. The solution can be further improved for the average case keeping the worst case complexity same.

Consider a large array as follows:

a = [10,11,12,13,14,15,16,17,...10000,1000000]

Now let's consider the case of finding intersection with a small array as follows:

b = [10000,1000000]

After we initialize the two pointers to the beginning of the arrays,  we keep moving in the first array for a long time. We can improve this by jumping to the next element higher or equal to the element being compared it. We could use binary search for that.

List<Integer> intersection(int[] a, int[] b){
    List<Integer> result = new ArrayList<>();
    
    int i = 0;
    int j = 0;
    for(; i< a.length &&  j < b.length;){
      if(a[i] == b[j]){
        result.add(a[i]);
        i++;
        j++;
      }else if (a[i] < b[j]){
        //i++;
        int low = i + 1;
        int high = a.length - 1;
        int mid = low + (high - low) / 2;
        while(low < high){
          if(a[mid] == b[j]){
            i = mid;
            break;
          }else if(a[mid] > b[j]){
            high = mid - 1;
          }else{
            low = mid + 1;
          }
          mid = low + (high - low) / 2;
        }
        if (low >= high){
          i = low;
        }
        
       }else {
        //j++;
        
        int low = j + 1;
        int high = b.length - 1;
        int mid = low + (high - low) / 2;
        while(low < high){
          if(b[mid] == a[i]){
            j = mid;
            break;
          }else if(b[mid] > a[i]){
            high = mid - 1;
          }else{
            low = mid + 1;
          }
          mid = low + (high - low) / 2;
        }
        if (low >= high){
          j = low;
        }
      }
    }
    return result;
  }

This improves the average case performance, especially for skewed inputs as mentioned earlier.

Saturday, 21 November 2020

Elasticsearch shard allocation failure

 Recently, I found a shard allocation failure in an Elasticsearch cluster. Checking the reason for the failure, I found the reason to be the following:

"Validation Failed: 1: this action would add [4] total shards, but this cluster currently has [4000]/[4000] maximum shards open;"

It is a bit unclear from the error message why the cluster considers 4000 as the maximum number of shards it should consider. Especially, given that on a different cluster I get 16000 as the maximum number of shards.

Looking at my cluster settings (including defaults), I found a property called "max_shards_per_node" and it was set to "1000". Since, I had 4 nodes, it added up perfectly.

Generally, this setting is put in so that the nodes have a proper balance between different types of work loads. However, we knew our scenarios of more shards was temporary so increasing the value to 1500 worked for the short term.

Monday, 14 September 2020

Gitk permissions on Alacritty

Recently, I started using Alacritty instead of the Terminal app provided by Apple. It sure is much faster.

There seems to be a new and annoying security feature introduced by Apple that has granular permissions for applications downloaded from the internet. So, Alacritty does not start with the same permissions as Terminal. Setting that up is a bit of a hassle. When I started gitk, it asked for some permission and I could not understand it then so I denied it. So, gitk did not work. I thought restarting Alacritty might request permissions again but that is not the case. The error I was getting is as follows:

 % gitk
Error in startup script: 58:102: execution error: Not authorised to send Apple events to System Events. (-1743)
    while executing
"exec osascript -e [format {
        tell application "System Events"
            set frontmost of processes whose unix id is %d to true
        end te..."
    invoked from within
"if {[tk windowingsystem] eq "aqua"} {
    exec osascript -e [format {
        tell application "System Events"
            set frontmost of processes ..."
    (file "/usr/local/bin/gitk" line 12261)

So, I had to find the right permission and grant that to get gitk working. The System Events permission is apparently under Automation section. Once that is granted, Alacritty can run gitk.

Monday, 20 January 2020

Learning Puppet III: Installing modules

In my previous post, we took a look at creating classes in Puppet and applying them locally. Let us look at installing Puppet modules next. Puppet does not install stdlib by default unlike other programming environments.

Common functionalities like reading a JSON file are part of stdlib module. It can often come in handy. Installing it is straightforward. However, after installation the modules are not loaded at run time.

Let us consider an example of reading a JSON file in a Puppet manifest. It can be an example of reading some config from an external resource.

Let us consider the module mod which we created in the previous post.

class mod::test_json{
      include mod

      $hash = loadjson('/tmp/load.json')
      [$message, $return_value] = $hash['output']
      notice("$message, $return_value")

}


To test, let us create a file at with the following content.

{
  "output": {
    "message": "The value is: ",
    "return_value": "0"
  }
}


To apply the manifest, we can try running the following:

puppet apply --modulepath=/path/to/parent/of/module -e "include mod::test_json"

This fails because loadjson is in stdlib. We need to enhance the module path to include stdlib as well. Modules are installed to the path /etc/puppetlabs/code/environments/production/modules. So, we need to modify our command as follows:

puppet apply --modulepath=/etc/puppetlabs/code/environments/production/modules:/vagrant/puppet -e "include elk::test_json"

Learning Puppet - II: Applying Classes

In my previous post about learning Puppet, I left a few open questions that I will attempt to answer as I continue learning.
  1. How can I apply Puppet classes?
  2. How can I install modules?
  3. How can I define functions in Puppet?
In the last post, I was unable to apply a class so I moved ahead without it. Let us look at that in this post.

To apply a class, I created a module, created its init.pp file and added a class in it.

Let us consider the module name to be mod. The init.pp will be located at mod/manifests/init.pp. I just keep it empty to begin with. The name of the class in that file has to match the module name though.

class mod{
}


Moving on to create a class of interest, let us add one to install Openjdk. The file would be mod/manifests/jdk.pp.

class mod::jdk () {
  notice("Installing openJDK")
  # Install java
  package { 'openjdk':
    ensure => installed,
    name => 'java-1.8.0-openjdk-headless',
    alias => 'openjdk',
  }
}


To apply the manifest, the command would be as follows:

puppet apply --modulepath=/path/to/parent/of/module -e "include mod::jdk"