The Crucial m4 (Micron C400) SSD Reviewby Anand Lal Shimpi on March 31, 2011 3:16 AM EST
Random Read/Write Speed
The four corners of SSD performance are as follows: random read, random write, sequential read and sequential write speed. Random accesses are generally small in size, while sequential accesses tend to be larger and thus we have the four Iometer tests we use in all of our reviews.
Note that we've updated our C300 results on our new Sandy Bridge platform for these Iometer tests. As a result you'll see some higher scores for this drive (mostly with our 6Gbps numbers) for direct comparison to the m4 and other new 6Gbps drives we've tested.
Our first test writes 4KB in a completely random pattern over an 8GB space of the drive to simulate the sort of random access that you'd see on an OS drive (even this is more stressful than a normal desktop user would see). I perform three concurrent IOs and run the test for 3 minutes. The results reported are in average MB/s over the entire time. We use both standard pseudo randomly generated data for each write as well as fully random data to show you both the maximum and minimum performance offered by SandForce based drives in these tests. The average performance of SF drives will likely be somewhere in between the two values for each drive you see in the graphs. For an understanding of why this matters, read our original SandForce article.
If there's one thing Crucial focused on with the m4 it's random write speeds. The 256GB m4 is our new king of the hill when it comes to random write performance. It's actually faster than a Vertex 3 when writing highly compressible data. It doesn't matter if I run our random write test for 3 minutes or an hour, the performance over 6Gbps is still over 200MB/s.
Let's look at average write latency during this 3 minute run:
On average it takes Crucial 0.06ms to complete three 4KB writes spread out over an 8GB LBA space. The original C300 was pretty fast here already at 0.07ms—it's clear that these two drives are very closely related. Note that OCZ's Vertex 3 has a similar average latency but it's not actually writing most of the data to NAND—remember this is highly compressible data, most of it never hits NAND.
Now let's look at max latency during this same 3 minute period:
You'll notice a huge increase in max latency compared to average latency, that's because this is when a lot of drives do some real-time garbage collection. If you don't periodically clean up your writes you'll end up increasing max latency significantly. You'll notice that even the Vertex 3 with SandForce's controller has a pretty high max latency in comparison to its average latency. This is where the best controllers do their work. However not all OSes deal with this occasional high latency blip all that well. I've noticed that OS X in particular doesn't handle unexpectedly high write latencies very well, usually resulting in you having to force-quit an application.
Note the extremely low max latency of the m4 here: 4.3ms. Either the m4 is ultra quick at running through its garbage collection routines or it's putting off some of the work until later. I couldn't get a clear answer from Crucial on this one, but I suspect it's the latter. I'm going to break the standard SSD review mold here for a second and take you through our TRIM investigation. Here's what a clean sequential pass looks like on the m4:
Average read speeds are nearing 400MB/s, average write speed is 240MB/s. The fluctuating max write speed indicates some clean up work is being done during the sequential write process. Now let's fill the drive with data, then write randomly across all LBAs at a queue depth of 32 for 20 minutes and run another HDTach pass:
Ugh. This graph looks a lot like what we saw with the C300. Without TRIM the m4 can degrade to a very, very low performance state. Windows 7's Resource Monitor even reported instantaneous write speeds as low as 2MB/s. The good news is the performance curve trends upward: the m4 is trying to clean up its performance. Write sequentially to the drive and its performance should start to recover. The bad news is that Crucial appears to be putting off this garbage collection work a bit too late. Remember that the trick to NAND management is balancing wear leveling with write amplification. Clean blocks too quickly and you burn through program/erase cycles. Clean them too late and you risk high write amplification (and reduced performance). Each controller manufacturer decides the best balance for its SSD. Typically the best controllers do a lot of intelligent write combining and organization early on and delay cleaning as much as possible. The C300 and m4 both appear to push the limits of delayed block cleaning however. Based on the very low max random write latencies from above I'd say that Crucial is likely doing most of the heavy block cleaning during sequential writes and not during random writes. Note that in this tortured state—max write random latencies can reach as high as 1.4 seconds.
Here's a comparison of the same torture test run on Intel's SSD 320:
The 320 definitely suffers, just not as bad as the m4. Remember the higher max write latencies from above? I'm guessing that's why. Intel seems to be doing more cleanup along the way.
And just to calm all fears—if we do a full TRIM of the entire drive performance goes back to normal on the m4:
What does all of this mean? It means that it's physically possible for the m4, if hammered with a particularly gruesome workload (or a mostly naughty workload for a longer period of time), to end up in a pretty poor performance state. I had the same complaint about the C300 if you'll remember from last year. If you're running an OS without TRIM support, then the m4 is a definite pass. Even with TRIM enabled and a sufficiently random workload, you'll want to skip the m4 as well.
I suspect for most desktop workloads this worst case scenario won't be a problem and with TRIM the drive's behavior over the long run should be kept in check. Crucial still seems to put off garbage collection longer than most SSDs I've played with, and I'm not sure that's necessarily the best decision.
Forgive the detour, now let's get back to the rest of the data.
Many of you have asked for random write performance at higher queue depths. What I have below is our 4KB random write test performed at a queue depth of 32 instead of 3. While the vast majority of desktop usage models experience queue depths of 0—5, higher depths are possible in heavy I/O (and multi-user) workloads:
High queue depth 4KB random write numbers continue to be very impressive, although here the Vertex 3 actually jumps ahead of the m4.
Random read performance is actually lower than on the C300. Crucial indicated that it reduced random read performance in favor of increasing sequential read performance on the m4. We'll see what this does to real world performance shortly.