We're testing a moderately fast nfs server. Testing on our data set of large files, ~100M we were seeing performance a little over 100MB/s. Now, when we shared the same filesystem out via nfs performance dropped to a mere 44MB/s and no matter what the r/wsize of the client, would not increase. Both machines are connected via a 1Gbps link that has been shown to move 941mb/s. Here's a snapshot what the server workload looked like on the server. The first line shows the local testing, notice the 108MB/s and the second shows the new access. The interesting part is the device utilization during the tests, even though we were pushing half the requests/second and moving half the bandwidth, we were at 90% with a higher wait time 4ms vs 1.7ms.
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sdb 30.50 0.00 906.00 0.00 108530.00 0.00 239.58 1.61 1.78 1.08 98.15
sdb 870.00 0.00 500.50 0.00 44626.00 0.00 178.33 2.06 4.09 1.83 91.55
Now, let's tweak the ioscheduling a little by setting slice_idle to 1 as recommended in the dell guide below. 'echo 1 > /sys/block/sdb/queue/iosched/slice_idle' This affects how long the io scheduler will wait in each queue before looking for more work. On local access where you will have lots of low latency requests you probably want to wait for more work, however over a network with lots of clients you will want to switch as soon as possible
sdb 1164.18 0.00 1049.75 0.00 71150.25 0.00 135.56 2.34 2.23 0.87 91.79
let's go a little further and set it to 0
sdb 144.00 0.00 2428.50 0.00 92034.00 0.00 75.79 3.95 1.63 0.33 79.60
Now we're up to 92MB/s which is above 90% of what our network can do. Good enough!