Wednesday, February 9, 2011

Java digesting performance

For the ACE audit manager, reading and generating digests on files is slowest part of auditing. In previous (1.6 and lower) versions of ACE digesting was done in a simple, read/update digest loop using Java's DigestInputStream. This appeared to work good enough, however I wanted to see what effect large blocks have on this model. When reading from remote resources, we up the block size from a standard 4-32k to 1MG.

Using that model, here's how performance looked for a variety of block sized, both aligned and unaligned. The following table compares both only reading and reading then updating. The test data ~16G of large (27MG) files on a SATA disk, running on a machine w/ 8G ram and an Intel Q9550 @2.83Ghz. The digest algorithm was SHA-256, using Sun's provider.

Block sizereaddigest onlyread+digest
409696.5 MB/s69.6 MB/s 53.1 MB/s
819295.9 MB/s69.6 MB/s54.8 MB/s
1000094.6 MB/s 69.2 MB/s54.0 MB/s
104857697.8 MB/s72.1 MB/s34.3 MB/s

The interesting take-away is how much worse large block preforms when you follow the read then process model.

The next step is to change the model and split the digesting and reading into separate threads. The new model uses two threads and two queues. The two threads exchange a pre-allocated set of byte arrays using an empty-array queue and a filled-array queue. These queues were Java LinkedBlockingQueues.

Read Thread:
  1. Pop byte buffer from empty-array queue
  2. Read into the byte array.
  3. Push buffer into filled-array queue.

Digest Thread:
  1. Pop byte buffer from filled-array queue
  2. Update digest using byte array.
  3. Push buffer into empty-array queue.

On the following test runs, I used 5 byte buffers of the above sizes to see how performance would vary.
Block sizereaddigest onlyread+digest
409692.2 MB/s65.63 MB/s 54.7 MB/s
819292.4 MB/s67.5 MB/s57.0 MB/s
1000087.8 MS/s 68.0 MB/s57.2 MB/s
104857697.4 MB/s72.2 MB/s64.7 MB/s

Small block performance is pretty much unchanged, however on large block, large file there is a substantial speedup running at almost 90% of the possible digest speed vs 48% possible speed. The next version of ACE will switch to this method of data reading.