Tuesday, November 17, 2009

About integer overflows...

One fine day, when I was browsing StackOverflow as usual, I came across this particular question. The gist of the question was this:
A person in an interview asked me how I would generate all possible integer values in Java.
Some people, including me, thought it was a simple question. All you have to do is this, right?
  for (int i = Integer.MIN_VALUE; i <= Integer.MAX_VALUE; i++) {
    System.out.println(i);
  }
(I confess I did not readily think of using the constants Integer.MIN_VALUE and Integer.MAX_VALUE. In my mind, the first program that hit me used the actual values of Integer.MIN_VALUE and Integer.MAX_VALUE. But reading through some of the comments to the question, I quickly realized I could use them.)

Simple right?

Wrong, as I found out later.

It turns out the program will run fine until the variable, i, equals Integer.MAX_VALUE. At that point, the System.out.println() will print the value of i which is Integer.MAX_VALUE and then control exits the current iteration of the for loop. Now i is incremented. What will the value of i now be? I expected it to be Integer.MAX_VALUE + 1. But no, its Integer.MIN_VALUE!! Since Integer.MIN_VALUE is less than Integer.MAX_VALUE, the conditional expression passes.

The outcome, sweetie, is that it's an infinite for loop.

You can test this out for yourself by executing the above program. but wait.. surely you are not going to wait for the program to run through all values from Integer.MIN_VALUE till Integer.MAX_VALUE? That's 4294967295 numbers!! When will you program finish executing? Instead you can try out this simple program:
int i = Integer.MAX_VALUE;
System.out.println((i + 1) == Integer.MIN_VALUE);
Executing it would print "true".

So to answer the interview question, the correct code to print all values that an int can store is:
for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE; i++) {
  System.out.println(i); 
}
System.out.println(Integer.MAX_VALUE);

The bits deep under it all...

"But, but", I thought, "how can incrementing a variable that holds Integer.MAX_VALUE result in Integer.MIN_VALUE?"

To understand why this occurs, we must dive deep into the world of bits and bytes. Java defines an int data type as something that can hold upto 32 bits. Thus, the lowest value that it can hold is given by Integer.MIN_VALUE, which is actually -2147483648, which in binary is represented as 1000 0000 0000 0000 0000 0000 0000 0000. (You don't have to struggle a lot to get the binary representation of a number. Java provides you the Integer.toBinaryString(int) method. Another way to get the binary representation is to open Calculator in Windows, type in the number and click on 'Bin' radio button to get the binary representation.)

  • In case you have forgotten, in Java, of the 32 bits for an int, the left most bit (aka the higher order bit) indicates the sign, with 0 being positive and 1 being negative. That is why in the previous value, the higher order bit is 1. So basically Java uses only 31 bits to store the number.
  • "If that is the case", you ask, "how come the other digits are all 0? If all digits are 0 in binary, then shouldn't the value be 0 in decimal rather than -214 whatever? Are you doing something wrong?" Nope. Java stores values in two's complement form. In case this two's complement thing is new to you, please read all about it here before going ahead with this blog post.


Similarly, the highest value an int can hold is 2147483647, which in binary is 0111 1111 1111 1111 1111 1111 1111 1111. Notice something about the higher order bit?

Now that we have established the basics, let's go back to our original question - when you increment Integer.MAX_VALUE, why does it revert back to Integer.MIN_VALUE? Let's do the arithmetic and see what we get...

Addend: 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Addend: 1

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The result of adding 1 to Integer.MAX_VALUE is 1000 0000 0000 0000 0000 0000 0000 0000, which is the value we previously got for Integer.MIN_VALUE!!.

The same reason is why we get Integer.MAX_VALUE when we decrement Integer.MIN_VALUE.

Minuend: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Subtrahend: 1

0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

This is the case with other languages that have the concept of datatypes. Let's take up C, which has datatypes just like Java. In C, the maximum value of an int is given by INT_MAX, defined in limits.h. Here's a program below that stores the value of INT_MAX in a variable, and tries to increment that value.
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
int main() {
  int max = INT_MAX; 
  printf("Max int value: %d\n", max);
  max++;
  printf("Now value is %d\n", max);
  printf("Min int value: %d\n", INT_MIN);
  printf("Are they both equal?: %d\n", (max == INT_MIN));
  return 0;
}
When you execute this program, you would find that initially max holds the value 2147483647. However, when you do max++, the value of max immediately becomes -2147483648.

Thus, this 'problem' exists in C too!! In fact, I would go so far to say that this problem exists in most programming languages. This problem exists because we have reached the limit of what the language can store for that data type. To store numbers beyond that limit, the language will have to recognize bits beyond that limit as bits that represent the numberical value, which it doesn't do. As an example, consider the case where we increment Integer.MAX_VALUE by 1 in the Java code seen above. Integer.MAX_VALUE already has 31 '1' bits and its 32nd bit (the higher order bit) is 0. When we increment, the 31 '1' bits become '0' bits and the 32nd bit becomes '1'. But Java and C do not recognize the new value of the higher order bit as one that represents the new incremented value. Instead, they use it for the sign - thus taking only the 31 '0' bits as the binary representation of the incremented value, which is Integer.MIN_VALUE!!

What if we want the language to recognize those higher bits as part of the numeric value representation? We would have to use datatypes that store numbers using more number of bits. An example is long, which stores values using 64 bits. Using long, the solution we have posted above becomes:
for (long i = Integer.MIN_VALUE; i <= Integer.MAX_VALUE; i++) {
  System.out.println(i);
}
Since a long is 64 bits, integer values will very easily fit into it, and hence the solution can be provided without any extra printing of Integer.MAX_VALUE. But even in that case, you do have a limit beyond which the value resets to the lowest number that can be stored by the datatype. In case of long, because it stores numbers using 64 bits, the highest value that it can store is 9223372036854775807.

"Ok," you say, "just for the purposes of making our understanding concrete, let us assume that a language has a 32-bit integer datatype that stores only positive values. In such a case, there is no need for the sign bit - all values of that type are positive anyway. Would it then be possible to increment a variable that holds a value of 2147483647 and still get the right answer?"

Actually, C does have such a datatype - it's called unsigned int. And yes, you can go beyond INT_MAX. Check out this program:
#include <stdio.h>
#include <stdlib.h>
#include <limits.h>
int main() {
  unsigned int value = INT_MAX;
  printf("value is: %u\n", value);
  value++;
  printf("incremented value is: %u\n", value);
  return 0;
}
When you increment, the value of value is 2147483648.

So that was what it was all about... when you have a language that supports datatypes, and when you use a datatype, remember that when you reach the limit of the datatype and increment the value, it will result in the lowest value that can be stored by the datatype. Depending upon your work, you might never face this situation in the real-world - it might come up only in interview questions - but still, it pays to learn this and keep this in a corner of your mind. I have been referring to this phenomeonon by using the word, 'problem' all this while, but it actually does have a name - Integer overflow.

P.S.: I am not sure if there are any strongly typed languages that handle this overflow by themselves instead of leaving it to the developer like Java and C/C++ do. If you do know of any, please do mention them in the comments!!