Issues with java enum hashCode()

Recently, I discovered an issue with java enum hashCode() while debugging an issue  related to a map-reduce application. With map-reduce framework, only a particular reducer should receive all the records with same key. However, I noticed that different reducers were getting records for the same key.  It meant that something was wrong with the way keys were getting hashed and distributed among the reducers. So, it meant that something is wrong with hashCode() method of this custom key class but I couldn't see anything obviously wrong by looking at the auto-generated hash method.

While discussing this issue with a colleague, I came across the fact that hashCode() for an enum class is actually the memory address of enum. Since, the memory address of enum will vary on different machines, the hashCode() generated for same key varied on different machines in Hadoop framework and that's why records with same key went to different reducers.

The custom reducer key class looked like as below:

public class Key implements WritableComparable {
 private SomeEnum e;
 private long id;

 private enum SomeEnum {
 ENUM_1, ENUM_2
 }

 @Override
 // Auto-generated code by eclipse IDE
 public int hashCode() {
 final int prime = 31;
 int result = 1;
 result = result * prime + ( (e == null) ? 0 : e.hashCode());
 result = result * prime + (int) (id ^ (id >>> 32));
 return result;
 }

 // other methods ...
}

The hashcode method in enum delegates to Object.hashCode().

I wrote a small piece of code to verify the hashCode() behavior and noticed that even on same machine, the hashCode() would differ for same enum if it's run in different JVMs.

public enum EnumTest {
 ENUM_1("1"),
 ENUM_2("2");

 private String s;

 EnumTest(String s) {
 this.s = s;
 }

 public static void main(String[] args) {
 System.out.println("Enum 1 hash " + EnumTest.ENUM_1.hashCode());
 System.out.println("Enum 2 hash " + EnumTest.ENUM_2.hashCode());
 System.out.println("Enum 1 ordinal " + EnumTest.ENUM_1.ordinal());
 System.out.println("Enum 2 ordinal " + EnumTest.ENUM_2.ordinal());
 System.out.println("Enum 1 string hash " + EnumTest.ENUM_1.s.hashCode());
 System.out.println("Enum 2 string hash " + EnumTest.ENUM_2.s.hashCode());
 }
}
------------
Run-1
------------
Enum 1 hash 730401895
Enum 2 hash 848123013
Enum 1 ordinal 0
Enum 2 ordinal 1
Enum 1 string hash 49
Enum 2 string hash 50

-----------
Run-2
-----------
Enum 1 hash 2022437173
Enum 2 hash 730401895
Enum 1 ordinal 0
Enum 2 ordinal 1
Enum 1 string hash 49
Enum 2 string hash 50

Lessons learnt:

  • Don't use enum hashCode() especially for distributed applications.
  • Don't be shy to discuss even small issues with colleagues ๐Ÿ™‚
  • Don't blindly trust auto-generated methods and rule them out from being buggy!

How to use HTC Mozart-7 as USB drive

(Source: Discussion on xda-developers forum)

I found the steps of using HTC Mozart-7 as USB device on the above link.

  1. Open Windows Registry Editor by typing regedit in your start menu
  2. Go to HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\USB
  3. Perform a search for ZuneDriver
  4. The search should yield a result similar to: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Enum\U SB\VID_045E&PID_04EC&MI_00\6&27ffd631&0&0000\Device Parameters\WUDF
  5. ย Click up one level to the Device Parameters key
  6. Change ShowInShell from 0 to 1
  7. Change PortableDeviceNameSpaceExcludeFromShell from 1 to 0
  8. Change EnableLegacySupport from 0 to 1
  9. Change EnableDefaultAutoPlaySupport

After following, the above steps, I was seeing two icons for Windows phone in Windows explorer but when I clicked on either of them, I saw 0 items. Then, reading through the complete discussion on the link above, I found the further instructions to follow to make it work.

Under Device Parameters, you should see two sub-folders: one is called WUDF and the other is ZuneDriver.

  1. Under WUDF, you will find a key called Exclusive - change the value of this key from 1 to 0.
  2. Under ZuneDriver, you will find a key called UseWpdPrivateInterface - change the value of this key from 1 to 0, as well.

Now you should see only one occurrence of your Windows Phone in the Portable Devices section.Don't forget to close the Zune software if it opens when you connect your phone.

After following these instructions, I was able to use my phone as USB drive successfully.

(Epilogue: Now-a-days, most of the sites promote the initiative 'Go Green' in which they ask you not to waste paper by taking printouts and use e-copy or SMS instead. IRCTC(Indian Railways Booking System) also offer this facility now where you could show the electronic copy of ticket or the confirmation SMS sent by them. However, since the IRCTC site was so slow that I was not able to book tickets directly on it, I booked my tickets via Cleartrip site. But SMS sent by Cleartrip just contained date of journey and PNR number but not my name. So, I decided to install Adobe Reader application on HTC 7 Mozart and then copy the e-copy to my HTC Mozart-7 phone. The phone gets connected to PC via Zune software but I didn't see phone as a USB device in windows explorer. After following the instructions, I was able to copy the files to phone but I was not able to locate those files on phone. So, all this effort of configuring the phone to be used as USB device didn't solve my issue. Then, I found a workaround by sending the e-copies as email to self and then downloading the e-copies on phone. These files were then easily located using the Adobe Reader application on phone.
Phew! So much for Go Green! ๐Ÿ™‚ )

Returning a random line from a file given as input

Problem Statement:

Given a file as input, you have to return back a string representing the contents of one of the lines present in the file. The algorithm should be designed in a way such that probability of each line getting returned is same i.e. if the file has n lines, probability of each line getting returned as output would be \frac{1}{n} .

Constraints:

  • The file should be scanned only once.
  • There should be only constant extra space used.

Solution:

String getRandomLine(File file){
  String output = null;
  int linecount = 0;
  Random r = new Random();
  BufferredReader reader = new BufferedReader(new FileReader(file));
  String currentLineRead;
   while( (currentLineRead=reader.readLine())!=null ){
      linecount++;
      int rand = r.nextInt(linecount);
      if(rand+1 == linecount){
         output = currentLineRead;
      }
   }
}
return output;

Are you surprised that this simple method works? ๐Ÿ˜€

Explanation

The proof of correctness can be seen through mathematical induction on the loop invariant.

 Statement is that after seeing n lines, the probability of each line as output is \frac{1}{n}

Base case: linecount=1, then first line is stored as output as rand is 1 in first iteration. So, 1/1 = 1

Hypothesis: Assume, that it works correctly, for linecount=k, such that probability of k lines seen so far is \frac{1}{k} .

Induction step: we read k+1 th line and we get a random number between 0 to k. Probability of each random number being generated is \frac{1}{k+1} . So, the probability of current line (if random number generated is k) being stored as output becomes \frac{1}{k+1} . Let's see what happens to probability of each of the previous k lines. The probability of getting a number in the range 0 to k-1 while generating a random number in this iteration is  \frac{k}{k+1} , and the probability of each of the previous k lines was \frac{1}{k} at end of last iteration, so now multiplying these two probabilities, the probability of each of those k previous lines becomes \frac{1}{k+1} .

An exercise

In this solution, we are generating random numbers in the range 0 to k each time. Suppose, we wanted to cut down the time taken to generate random number in each iteration. So, we generated a random decimal number outside this loop in the range [0,1) and inside the loop, we multiplied that by linecount to get an integer in the range [0, linecount). Will this still give each line in file an equal chance of being the output string?

Flirting in a different sense …