Inside AnandTech 2013: The Hardwareby Anand Lal Shimpi on March 12, 2013 9:40 AM EST
By the end of 2010 we realized two things. First, the server infrastructure that powered AnandTech was getting very old and we were seeing an increase in component failures, leading to higher than desired downtime. Secondly, our growth over the previous years had begun to tax our existing hardware. We needed an upgrade.
Ever since we started managing our own infrastructure back in the late 90s, the bulk of our hardware has always been provided by our sponsors in exchange for exposure on the site. It also gives them a public case study, which isn't always possible depending on who you're selling to. We always determine what parts we go after and the rules of engagement are simple: if there's a failure, it's a public one. The latter stipulation tends to worry some, and we'll get to that in one of the other posts.
These days there's an tempting alternative: deploying our infrastructure in the cloud. With low (to us) hardware costs however, doing it internally still makes more sense. Furthermore, it also allows us to do things like do performance analysis and create enterprise level benchmarks using our own environment.
Spinning up new cloud instances at Amazon did have it appeal though. We needed to embrace virtualization and the ease of deployment benefits that came with it. The days of one box per application were over, and we had more than enough hardware to begin to consolidate multiple services per box.
We actually moved to our new hardware and software infrastructure last year. With everything going on last year, I never got the chance to actually talk about what our network ended up looking like. With the debut of our redesign, I had another chance to do just that. What will follow are some quick posts looking at storage, CPU and power characteristics of our new environment compared to our old one.
To put things in perspective. The last major hardware upgrade we did at AnandTech was back in the 2006 - 2007 timeframe. Our Forums database server had 16 AMD Opteron cores inside, it's just that we needed 8 dual-core CPUs to get there. The world changed over the past several years, and our new environment is much higher performing, more power efficient and definitely more reliable.
In this post I want to go over, at a high level, the hardware behind the current phase of our infrastructure deployment. In the subsequent posts (including another one that went live today) I'll offer some performance and power comparisons, as well as some insight into why we picked each component.
I'd also like to take this opportunity to thank Ionity, the host of our infrastructure for the past 12 months. We've been through a number of hosts over the years, and Ionity marks the best yet. Performance is typically pretty easy to guarantee when it comes to any hosting provider at a decent datacenter, but it's really service, response time and competence of response that become the differentiating factors for us. Ionity delivered on all fronts, which is why we're there and plan on continuing to be so for years to come.
Out with the Old
Our old infrastructure featured more than 20 servers, a combination of 1U dual-core application servers and some large 4U - 5U database servers. We had to rely on external storage devices in order to give us the number of spindles we needed in order to deliver the performance our workload demanded. Oh how times have changed.
For the new infrastructure we settled on a total of 12 boxes, 6 of which are deployed now and another 6 that we'll likely deploy over the next year for geographic diversity as well as to offer additional capacity. That alone gives you an idea of the increase in compute density that we have today vs. 6 years ago: what once required 20 servers and more than a single rack can easily be done in 6 servers and half a rack (at lower power consumption too).
Of the six, a single box currently acts as a spare - the remaining five are divided as follows: two are on database duty, while the remaining three act as our application servers.
Since we were bringing our own hardware, we needed relatively barebones server solutions. We settled on Intel's SR2625, a fairly standard 2U rackmount with support for the Intel Xeon L5640 CPUs (32nm Westmere Xeons) we would be using. Each box is home to two of these processors, each of which features 6-cores and a 12MB L3 cache.
Each database server features 48GB of Kingston DDR3-1333 while the application servers use 36GB each. At the time that we speced out our consolidation plans, we didn't need a ton of memory but going forward it's likely something we'll have to address.
When it comes to storage, the decision was made early on to go all solid-state. The problem we ran into there is most SSD makers at the time didn't want to risk a public failure of their SSDs in our environment. Our first choice declined to participate at the time due to our requirement of making any serious component failures public. Things are different today as the overall quality of all SSDs has improved tremendously, but back then we were left with one option: Intel.
Our application servers use 160GB Intel X25-M G2s, while our database servers use 64GB Intel X25-Es. The world has since moved to enterprise grade MLC in favor of SLC NAND, but at the time the X25-Es were our best bet to guarantee write endurance for our database servers. As I later discovered, using heavily overprovisioned X25-M G2s would've been fine for a few years, but even I wanted to be more cautious back then.
The application servers each use 6 x X25-M G2s, while the database servers use 6 x X25-Es. To keep the environment simple, I opted against using any external RAID controllers - everything here is driven by the on-board Intel SATA controllers. We need multiple SSDs not for performance reasons but rather to get the capacities we need. Given that we migrated from a many-drive HDD array, the fact that we only need a couple of SSDs worth of performance per box isn't too surprising.
Storage capacity is our biggest constraint today. We actually had to move our image hosting needs to our Ionity's cloud environment due to our current capacity constraints. NAND lithographies have shrunk dramatically since the days of the X25-Es and X25-Ms, so we'll likely move image hosting back on to a few very large capacity drives this year.
That's the high level overview of what we're running on, I also posted some performance data for the improvement we saw in going to SSDs in our environment here.