By the end of 2010 we realized two things. First, the server infrastructure that powered AnandTech was getting very old and we were seeing an increase in component failures, leading to higher than desired downtime. Secondly, our growth over the previous years had begun to tax our existing hardware. We needed an upgrade.

Ever since we started managing our own infrastructure back in the late 90s, the bulk of our hardware has always been provided by our sponsors in exchange for exposure on the site. It also gives them a public case study, which isn't always possible depending on who you're selling to. We always determine what parts we go after and the rules of engagement are simple: if there's a failure, it's a public one. The latter stipulation tends to worry some, and we'll get to that in one of the other posts.

These days there's an tempting alternative: deploying our infrastructure in the cloud. With low (to us) hardware costs however, doing it internally still makes more sense. Furthermore, it also allows us to do things like do performance analysis and create enterprise level benchmarks using our own environment. 

Spinning up new cloud instances at Amazon did have it appeal though. We needed to embrace virtualization and the ease of deployment benefits that came with it. The days of one box per application were over, and we had more than enough hardware to begin to consolidate multiple services per box.

We actually moved to our new hardware and software infrastructure last year. With everything going on last year, I never got the chance to actually talk about what our network ended up looking like. With the debut of our redesign, I had another chance to do just that. What will follow are some quick posts looking at storage, CPU and power characteristics of our new environment compared to our old one. 

To put things in perspective. The last major hardware upgrade we did at AnandTech was back in the 2006 - 2007 timeframe. Our Forums database server had 16 AMD Opteron cores inside, it's just that we needed 8 dual-core CPUs to get there. The world changed over the past several years, and our new environment is much higher performing, more power efficient and definitely more reliable.

In this post I want to go over, at a high level, the hardware behind the current phase of our infrastructure deployment. In the subsequent posts (including another one that went live today) I'll offer some performance and power comparisons, as well as some insight into why we picked each component.

I'd also like to take this opportunity to thank Ionity, the host of our infrastructure for the past 12 months. We've been through a number of hosts over the years, and Ionity marks the best yet. Performance is typically pretty easy to guarantee when it comes to any hosting provider at a decent datacenter, but it's really service, response time and competence of response that become the differentiating factors for us. Ionity delivered on all fronts, which is why we're there and plan on continuing to be so for years to come.

Out with the Old

Our old infrastructure featured more than 20 servers, a combination of 1U dual-core application servers and some large 4U - 5U database servers. We had to rely on external storage devices in order to give us the number of spindles we needed in order to deliver the performance our workload demanded. Oh how times have changed.

For the new infrastructure we settled on a total of 12 boxes, 6 of which are deployed now and another 6 that we'll likely deploy over the next year for geographic diversity as well as to offer additional capacity. That alone gives you an idea of the increase in compute density that we have today vs. 6 years ago: what once required 20 servers and more than a single rack can easily be done in 6 servers and half a rack (at lower power consumption too).

Of the six, a single box currently acts as a spare - the remaining five are divided as follows: two are on database duty, while the remaining three act as our application servers.

Since we were bringing our own hardware, we needed relatively barebones server solutions. We settled on Intel's SR2625, a fairly standard 2U rackmount with support for the Intel Xeon L5640 CPUs (32nm Westmere Xeons) we would be using. Each box is home to two of these processors, each of which features 6-cores and a 12MB L3 cache.

Each database server features 48GB of Kingston DDR3-1333 while the application servers use 36GB each. At the time that we speced out our consolidation plans, we didn't need a ton of memory but going forward it's likely something we'll have to address.

When it comes to storage, the decision was made early on to go all solid-state. The problem we ran into there is most SSD makers at the time didn't want to risk a public failure of their SSDs in our environment. Our first choice declined to participate at the time due to our requirement of making any serious component failures public. Things are different today as the overall quality of all SSDs has improved tremendously, but back then we were left with one option: Intel.

Our application servers use 160GB Intel X25-M G2s, while our database servers use 64GB Intel X25-Es. The world has since moved to enterprise grade MLC in favor of SLC NAND, but at the time the X25-Es were our best bet to guarantee write endurance for our database servers. As I later discovered, using heavily overprovisioned X25-M G2s would've been fine for a few years, but even I wanted to be more cautious back then.

The application servers each use 6 x X25-M G2s, while the database servers use 6 x X25-Es. To keep the environment simple, I opted against using any external RAID controllers - everything here is driven by the on-board Intel SATA controllers. We need multiple SSDs not for performance reasons but rather to get the capacities we need. Given that we migrated from a many-drive HDD array, the fact that we only need a couple of SSDs worth of performance per box isn't too surprising.

Storage capacity is our biggest constraint today. We actually had to move our image hosting needs to our Ionity's cloud environment due to our current capacity constraints. NAND lithographies have shrunk dramatically since the days of the X25-Es and X25-Ms, so we'll likely move image hosting back on to a few very large capacity drives this year.

That's the high level overview of what we're running on, I also posted some performance data for the improvement we saw in going to SSDs in our environment here.

POST A COMMENT

17 Comments

View All Comments

  • evilspoons - Tuesday, March 12, 2013 - link

    To be completely honest, I thought you were talking about your old systems when discussing the X25 SSDs (which was confusing because of the talk of upgrading from mechanical hard drives). This was cleared up in your second article about the SSDs specifically, but I spent a minute scratching my head.

    Looking forward to the power consumption information!
    Reply
  • marc1000 - Tuesday, March 12, 2013 - link

    What A Great new! But I Could Not Find This Article On The New Home Page - I Only Came Here By Using Twitter! Guys, You Really Have To Make Browsing On The New layout Easier.

    And The Comment Box Has Some Bug That Makes The Android Browser Of jelly Bean (4.2.Something) Capitalize Every Word! (The Ones Without It Was Me Correcting Typos).

    Good Luck On Fixing All The Quirks. Regards,
    Reply
  • marc1000 - Wednesday, March 13, 2013 - link

    this "capitals bug" does not happen on Chrome for Android. but right now I'm on IE 9 and went back to the home page trying to find this specific article. Nada. Vanished. Impossible to find on menus or anywhere in the new layout. Anand please review the functionality of this new menu because right now I have to follow your site on twitter. Reply
  • marc1000 - Wednesday, March 13, 2013 - link

    Ok, I figured out what is happening. when the "One question site survey" ad is open, it overlaps the WHOLE pipeline area. So I can't see the pipeline stories. when this ad is changed, the pipeline is back. sorry for not sending this bug report on the email, but the comment section was the fastest way I could figure to reach you. Reply
  • Paulman - Wednesday, March 13, 2013 - link

    That "Capitalize Every Word" bug is actually kinda hilarious :D Reply
  • marc1000 - Thursday, March 14, 2013 - link

    yeah, I also think it is. but it is a LOT harder to enter text on Android stock browser now, give it a try! Reply
  • tim851 - Wednesday, March 13, 2013 - link

    Anand, may I enquire about bottlenecks? What is in your specific case most likely to limit performance: front-end or back-end, storage performance or cpu performance, maybe even network bandwidth? Reply
  • DuckieHo - Wednesday, March 13, 2013 - link

    Why not Xeon E5-2650L or E5-2630L? I believe they were available a year ago Reply
  • Kevin G - Wednesday, March 13, 2013 - link

    I'm kinda surprised that the site is using older Xeon 5600 series chips instead of the newer Xeon E5's. That'd get you an extra memory channel per socket and upto two more cores. In addition the C600 series chipset has variants with 8 port SAS controller on-die which would eliminate the need for a dedicated RAID card for storage. Migrations like this are never simple so some lead time was required but Xeon E5's have been on the market nearly a year now. The hardware migration for the site is mentioned taking place last year. How much of an opportunity was there to go with the Xeon E5's instead of the older Xeon 5600 series? Was there consideration to hold off on the hardware upgrade for the impending generation of Xeon E5's?

    I'm also surprised that the environment isn't even more virtualized. Running the DB's as VM's isn't always the wisest choice due to overhead but it would give the environment flexibility in the event of a hardware failure. For example, run the DB's as the sole VM on a physical box and then you'd have the ability to migrate application server VM's there in the advent of an application host box going down.

    Network backend wasn't discussed at all. Not sure if jumping to 10G Ethernet would be worth it in this case. Storage has moved into the host environments which makes much of the old traffic local IO.

    Speaking of storage, I'm not surprised at the move to software RAID for flexibility. The ranking of performance in terms of importance has lowered with SSD's and thus the overhead of software RAID in a system can be absorbed. I'm more surprised at the selection of Intel's RAID instead of something like ZFS.
    Reply
  • SatishJ - Friday, March 22, 2013 - link

    I am running several servers with E5-2667 processors, some as physical servers and others as VMWare ESXi hosts. I am happy with the performance. I have also virtualized Oracle 11g database servers. The experience is quite positive though the database load is not very high. All the storage is off SSD-accelerated 8 Gbps FC SAN. Have a few E5-2690 based servers on order - I feel this processor is the performance yardstick at present. Reply

Log in

Don't have an account? Sign up now