Using that model, here's how performance looked for a variety of block sized, both aligned and unaligned. The following table compares both only reading and reading then updating. The test data ~16G of large (27MG) files on a SATA disk, running on a machine w/ 8G ram and an Intel Q9550 @2.83Ghz. The digest algorithm was SHA-256, using Sun's provider.
Block size | read | digest only | read+digest |
4096 | 96.5 MB/s | 69.6 MB/s | 53.1 MB/s |
8192 | 95.9 MB/s | 69.6 MB/s | 54.8 MB/s |
10000 | 94.6 MB/s | 69.2 MB/s | 54.0 MB/s |
1048576 | 97.8 MB/s | 72.1 MB/s | 34.3 MB/s |
The interesting take-away is how much worse large block preforms when you follow the read then process model.
The next step is to change the model and split the digesting and reading into separate threads. The new model uses two threads and two queues. The two threads exchange a pre-allocated set of byte arrays using an empty-array queue and a filled-array queue. These queues were Java LinkedBlockingQueues.
Read Thread:
- Pop byte buffer from empty-array queue
- Read into the byte array.
- Push buffer into filled-array queue.
Digest Thread:
- Pop byte buffer from filled-array queue
- Update digest using byte array.
- Push buffer into empty-array queue.
On the following test runs, I used 5 byte buffers of the above sizes to see how performance would vary.
Block size | read | digest only | read+digest |
4096 | 92.2 MB/s | 65.63 MB/s | 54.7 MB/s |
8192 | 92.4 MB/s | 67.5 MB/s | 57.0 MB/s |
10000 | 87.8 MS/s | 68.0 MB/s | 57.2 MB/s |
1048576 | 97.4 MB/s | 72.2 MB/s | 64.7 MB/s |
Small block performance is pretty much unchanged, however on large block, large file there is a substantial speedup running at almost 90% of the possible digest speed vs 48% possible speed. The next version of ACE will switch to this method of data reading.