Tuesday, March 20

How to reset the value of a Java Array - the fast way

Re-using large arrays in java sux, even with Sun's new JVMs.

The issue is that the naive method: fill an array using Arrays.fill is slow.

Large arrays are expensive to create; there is significant overhead creating and allocating the objects in a large array. Thus, it make sense to re-use these large arrays once they have been created, a simple concept referred to as pooling. But, apparently Java does not provide a way to do this efficiently (at least natively.... hack/trick to follow)... the JVM's JIT is supposed to do the fancy optimization for you.

However, I seem to have found a case where the JIT is not optimizing what should be simple. I realize this is probably platform specific, so my rough platform is a recent Sun JDK, Windows 32-bit, Intel, with the JVM set to run in -server mode (and yes I provided an ample warm-up period for the JIT).

I recently profiled an application and discovered that one of the major bottlenecks was Arrays.fill! Looking at the code for Sun's Arrays.Fill, I saw that all it does is simple, naive, loop over the contents of the array and assign the same value again and again. This was not what I expected, I expected to see a native call, like System.ArrayCopy. Another programmer on my team commented that in C it is easy to reset the value of an array, just use memset to set the entire contents of the array to a single value. Hmm, Sun's implementation appears to be sub-optimal, but the JIT should've fixed it.

Me and my co-worker Googled around to see if others found similar problems and we found an old IBM Research article from 1999 on the topic Java server performance: A case study of building efficient, scalable Jvms:
"There is no corresponding memset operation in the Java language specification, but it is still necessary and common to set an entire array to a single value. This shortcoming has been a sore spot for Java application writers, and a number of clever techniques have been introduced to improve the efficiency of initializing a primitive array to a common value. The most common technique used is to initialize a smaller piece of the array and use the System.arraycopy call to fill in the rest of the array in an expanding binary fashion."
The authors go on to say that the IBM JVM's JIT is smart enough to convert the array copy to a memset operation. So, here is the solution described by the researchers above. This code is from the Java Programmer's FAQ - Part B:

public static void bytefill(byte[] array, byte value) {
int len = array.length;
if (len > 0)
array[0] = value;
for (int i = 1; i < len; i += i) {
System.arraycopy( array, 0, array, i, ((len - i) < i) ? (len - i) : i);
}
}

There you go, a java err trick, hack, work around of the day... Any thoughts as to why Sun's new JVMs wouldn't be optimizing this out? Hmm.. perhaps I will try other JDKs and other platforms and see what happens.