Tag Archives: java

Issues with java enum hashCode()

Recently, I discovered an issue with java enum hashCode() while debugging an issue  related to a map-reduce application. With map-reduce framework, only a particular reducer should receive all the records with same key. However, I noticed that different reducers were getting records for the same key.  It meant that something was wrong with the way keys were getting hashed and distributed among the reducers. So, it meant that something is wrong with hashCode() method of this custom key class but I couldn't see anything obviously wrong by looking at the auto-generated hash method.

While discussing this issue with a colleague, I came across the fact that hashCode() for an enum class is actually the memory address of enum. Since, the memory address of enum will vary on different machines, the hashCode() generated for same key varied on different machines in Hadoop framework and that's why records with same key went to different reducers.

The custom reducer key class looked like as below:

public class Key implements WritableComparable {
 private SomeEnum e;
 private long id;

 private enum SomeEnum {
 ENUM_1, ENUM_2
 }

 @Override
 // Auto-generated code by eclipse IDE
 public int hashCode() {
 final int prime = 31;
 int result = 1;
 result = result * prime + ( (e == null) ? 0 : e.hashCode());
 result = result * prime + (int) (id ^ (id >>> 32));
 return result;
 }

 // other methods ...
}

The hashcode method in enum delegates to Object.hashCode().

I wrote a small piece of code to verify the hashCode() behavior and noticed that even on same machine, the hashCode() would differ for same enum if it's run in different JVMs.

public enum EnumTest {
 ENUM_1("1"),
 ENUM_2("2");

 private String s;

 EnumTest(String s) {
 this.s = s;
 }

 public static void main(String[] args) {
 System.out.println("Enum 1 hash " + EnumTest.ENUM_1.hashCode());
 System.out.println("Enum 2 hash " + EnumTest.ENUM_2.hashCode());
 System.out.println("Enum 1 ordinal " + EnumTest.ENUM_1.ordinal());
 System.out.println("Enum 2 ordinal " + EnumTest.ENUM_2.ordinal());
 System.out.println("Enum 1 string hash " + EnumTest.ENUM_1.s.hashCode());
 System.out.println("Enum 2 string hash " + EnumTest.ENUM_2.s.hashCode());
 }
}
------------
Run-1
------------
Enum 1 hash 730401895
Enum 2 hash 848123013
Enum 1 ordinal 0
Enum 2 ordinal 1
Enum 1 string hash 49
Enum 2 string hash 50

-----------
Run-2
-----------
Enum 1 hash 2022437173
Enum 2 hash 730401895
Enum 1 ordinal 0
Enum 2 ordinal 1
Enum 1 string hash 49
Enum 2 string hash 50

Lessons learnt:

  • Don't use enum hashCode() especially for distributed applications.
  • Don't be shy to discuss even small issues with colleagues 🙂
  • Don't blindly trust auto-generated methods and rule them out from being buggy!