Finding the Bottleneck

Remembering back to our first article on how AnandTech's server backbone works, all content (news, articles, and forum posts) are stored in a SQL database on a database server.  The main site has a database server as do the Forums.  The web servers handle presenting this content to you all whenever a request is made as they pull the content off of the database server and feed it into a template stored on the web servers.  When you hit www.anandtech.com or forums.anandtech.com the web servers are serving you templates stored on those servers filled with content pulled from the database servers.

The AnandTech web servers were definitely under a heavier load than the Forums web servers, simply because there are more people on the main site than there are on the Forums.  But comparing the load on AnandTech database server to the load on the Forums db server is like comparing apples to oranges; you can't take the load percentages for their face value.

The dynamics of the two entities are completely different.  The AnandTech main site is, for the most part, updated daily with the exception of the Web News section that features multiple updates throughout the day.  The beauty of this is that the main site can be served almost entirely out of memory, with the cache only having to be flushed once a day.  This makes the job of AnandTech's database server very easy, however the same can't be said about the AnandTech Forums.

Unlike the main site, the Forums are being constantly updated.  You can consider the Forums to be an example of what the main site would be like if we posted a new article every minute.  The nature of the Forums prevents us from employing much caching at all since it would be pretty useless if new posts only appeared once every 12 hours.  Compared to the workout that the main site gives its database server, the Forums put their database server through Olympic training, in particular, the server's disk I/O subsystem.

Initially, the Forums Database server was setup identically to the AnandTech database server, with three IBM Ultrastar 9LZX U2SCSI drives setup in a RAID 5 configuration.  For those of you that aren't familiar with RAID 5, it is a storage configuration that offers two benefits: Striping and Parity. 

Striping, a benefit gained from RAID 0, is the result of combining two or more hard disks in such a way that your read/write performance is increased by a factor of the number of drives you have present in the array.  The reason behind this is simple, instead of reading/writing directly off of a single disk, two drives combined in a RAID 0 fashion will allow data to be split into "stripes" and read from/written to each drive in the array.  The downside here is that if a single drive fails the entire array goes down. 

Parity is a feature that is provided for by RAID 5 when dealing with storage arrays.  In this type of an array, parity data is stored on all of the drives in the array (a minimum of 3).  In the event that a single drive out of the array fails, the data can be reconstructed from the parity data that exists on all of the remaining drives.  Fault tolerance does not exist if more than one drive fails. 

Two drives in a RAID 5 array can offer read performance similar to that of drives in a RAID 0 array (potentially even greater than that because you have one more physical drive with RAID 5).  As we stated before, this is approximately equal to or greater than twice the read performance of a single drive.  The real downfall of RAID 5 however is that its write performance is significantly lower than RAID 0.  The drives in a RAID 5 array must be essentially written to twice, one for the data and one for the parity information.  In a write-intensive situation such as the picture we painted for you all with the AnandTech Forums, RAID 5 offers more of a performance bottleneck than an advantage. 

Index The Solution

Log in

Don't have an account? Sign up now