There are two important trends in the server market: it is growing and it is evolving fast. It is growing fast as the number of client devices is exploding. Only one third of the world population has access to the internet, but the number of internet users is increasing by 8 to 12% each year. 

Most of the processing happens now on the server side (“in the cloud”), so the server market is evolving fast as the more efficient an enterprise can deliver IT services  to all those smartphones, tablets and pcs, the higher the profit margins and the survival chances.

And that is why there is so much interest in the new star, the “micro server”.  

The Demand for Micro-Servers

The demand for very low power server is, although it is high on the hype curve, certainly not imaginary. When you run heterogeneous workloads, for example a mailserver, an OLAP database and several web and fileservers, some of these workloads will be demanding heavy processing power, others will be close to idle. In that case it is best to buy a dual "large core" CPU (or better) server, install a hypervisor and cut the machine up in resource pools. As one workload demands high processing power, your hardware and hypervisor will deliver it. Single thread performance will determine for a large part whether a complex SQL query is responded in a fraction of a second or several seconds.  The virtualization layer gives you extra bonuses such as high availability, no more unplanned downtime etc.

If your web based workloads are very homogenous and you know how much horsepower your webapplication needs, things are very different. The virtualization layer is just adding complexity and cost now. In this case it is a lot more efficient to scale out than to divide your heavy server into virtual machines.  The single thread performance has to be good enough to respond to a request quickly enough. But throughput demands can be handled by adding a load balancer in front of low power servers. It is much easier to scale this way.

The problem is that the your average server is not well suited for these kind of homogenous workloads. Yes, servers have become a lot more efficient by including advanced power management features. The introduction of CPU C-states and more efficient PSUs are among the top of technologies that saved a lot of power. However, even the most modern servers are still needlessly complex with lots of disk, network and other interfaces. A useless interface wastes a few hundred of mwatt and a less efficient PHY (Copper 10 Gbit Ethernet for example) wastes a few Watt, but in the end it adds up. 

Low power server CPUs
Comments Locked

80 Comments

View All Comments

  • zepi - Tuesday, June 18, 2013 - link

    The problem is, that AMD's big cores are so bad compared to Intel's big cores, that it makes next to no sense to compete against Intel with them. Perf/w is worse and manufacturing costs are far too high. To sell chips made out of these cores, they need to cut their prices so low, that they can't make any profits to pay for R&D or any other stuff. This applies to both server and desktop-market. Sure, you can sell expensive to manufacture multi-module phenoms for cheap-ass people who want best multicore performance per dollar, but what's the benefit when you can't make any money doing so?

    AMD is in dire need of a complete big-core CPU architecture renewal and with their R&D resources that just isn't probably going to happen any time soon. Unless they can pull some kind of a magical bunnyrabit from their hat, I don't see them being competetive in big-core ever again.

    They are shifting their target to those markets where they hope they can still compete.
  • JDG1980 - Tuesday, June 18, 2013 - link

    The big question right now is if Steamroller can fix the problems with AMD's construction equipment architecture or not. Official estimates are quite bullish, promising 15%-30% gains on a clock-for-clock basis. No doubt these are overly generous estimates and I take them with a grain of salt, but if AMD can increase actual IPC by 10% or more with Steamroller (rather than just cranking the clock speed higher) then there may be hope for the construction equipment cores. If not, then AMD's best bet is ditching that line altogether, and scaling up Jaguar or its successor so it's reasonably competitive on the desktop. The good thing is that Jaguar is already optimized for low power (which is where the Bulldozer lineage really falls short) and its IPC is pretty good. And they've already got some nice design wins with the PS4 and Xbone, which demonstrates that these cores are suitable for gaming (an "enthusiast" use). Perhaps they could backport some of the features that Bulldozer and its successors actually got right, like the improved branch predictor. (Or did they already do that with Jaguar?) After all, this is basically what Intel did when they dropped Netburst in favor of a revised version of the P6 architecture.
  • JPForums - Tuesday, June 18, 2013 - link

    @zepi "Sure, you can sell expensive to manufacture multi-module phenoms for cheap-ass people who want best multicore performance per dollar, but what's the benefit when you can't make any money doing so?"

    Even if you can't make any money, if you can break even it is still useful for keeping your employees employed. While a business doesn't have an inherent need to employ someone for the sake of employing them, in this case, it is useful for maintaining your talent pool. For a company like AMD, the engineers' work cycles are most likely punctuated with periods of high demand and low demand. When you have fewer product lines, this means their could be periods of time where they have no work to do at all while waiting for work to be completed farther up the line. Even a small loss is better that paying a chunk of employees to do nothing while waiting for the next thing to come down the pipe. Having more product lines allows you to even out such lulls by staggering releases and thus filling in the gaps from one product line with work from another.

    As an example, if the employees responsible for the low power line's layout only worked the low power line, many of them would have been left with nothing to do as they were waiting for the Jaguar architecture to be developed and simulated. Tweaks to the bobcat layout and preparations for the next node change would have kept some of them busy, but it is quite likely that many found work in the mainstream or even FX lines in the interim.
  • TiredOldFart2 - Tuesday, June 18, 2013 - link

    If you take a look at amd's assets, their portfolio and their current situation its not hard to see where they are headed. Money is usually made in the middle of the road. By this i mean most sales for enterprise class server cpus in this economic scenario will target a balance of sufficient computing power, price and power consumption.

    What would you, as a business owner, opt for your average vm server for your average medium business needs? the $600, 95w e5-2630 or the $290, 115w opteron 6320? I wont even discuss the different standards when it comes to the tdp rating both companies have, its a matter of cold hard cash.

    AMD will sell cheap, will move faster while listening to clients, will take more risks on niche markets, will leverage their gpu technologies onto the server market to make up for their less than stellar fpu performance.

    How big is the HPC market compared to the SME one?
  • JPForums - Tuesday, June 18, 2013 - link

    I'm not sure this is the correct time, but I do think that eventually we will see a merger or at least closer alignment of the FX line and the A series products. Consider that since before the bulldozer architecture was conceptualized, AMD had been looking to fuse the CPU and GPU into one chip. They wanted to allow people to program code for the "GPU" portions of the processor as easily as the "CPU" portions and even within the same code blocks. They've steadily (if slowly) progressed towards this goal since then culminating in their current HSA and hUMA technologies. When looked at from this perspective, the subpar floating point performance of bulldozer and its derivatives makes sense. If you have a set of "GPU" cores or "stream processors" available to handle floating point operations, then it seems less necessary to include them in the CPU cores.

    Unfortunately, this merger is taking longer than AMD's initial expectations. Even if AMD's intention was to leverage discrete GPU's, in the mean time, to cover the floating point gap, software hasn't yet progressed to the point make it happen. For the moment, a GPUless part is necessary to serve higher performance sectors. Though eventually, I do expect to see GPUish elements in their high end parts to handle parallel operations and possibly augment the floating point characteristics of the processors. At this point, the transistors dedicated to the "GPU" portion will no longer be useless die space in regards to CPU performance. Such processors would have a much easier time with voice recognition, facial recognition, pattern recognition, neural algorithms (A.I. learning), ect.
  • JDG1980 - Tuesday, June 18, 2013 - link

    AMD can't expect third-party code to be rewritten to accomodate their processors. If they can leverage the GPU for floating point, then fine, but it has to work seamlessly with existing CPU opcodes. In other words, the APU has to *internally* see that a stream of (say) SSE2 floating point instructions is coming, and hand that off to the GPU portion, without requiring anything to be recoded.

    AMD doesn't have the market share to tell software vendors to do things their way.
  • Shadowmaster625 - Tuesday, June 18, 2013 - link

    That 2013 picture is some scary schiznit! Last time I went to a concert that was what it was like too. Those screens are right out of some science fiction horror novel. It is amazing what people cannot see, even when it is so plainly obvious.
  • bji - Tuesday, June 18, 2013 - link

    Yeah people are more focused on burying their noses in their phones and capturing the moment than actually living the moment. I don't have a smart phone and I notice that I pay alot more attention to what I'm actually doing than most people most of the time. I don't know why people think it's necessary to record a crappy smart phone recording of an event when you can almost certainly buy a professionally made recording of almost any important event after the fact for a few bucks.
  • silverblue - Tuesday, June 18, 2013 - link

    "Andrew Feldman told us that Berlin will offer at least twice CPU processing performance than the Opteron X-series."

    I'd damn well hope it was a lot more than this. If it's clocked at twice the speed then Berlin will be forgettable, however if the comparison is with Berlin clocked at, say, 3GHz, that's not so bad.

    All non-BD AMD architectures seem to scale very well with additional cores, and this is the main area that SR looks to improve upon.
  • nismotigerwvu - Tuesday, June 18, 2013 - link

    Small typo on page 4, on the very first sentence," The current Opteron 4310 EE (2 modules, 4 cores at 2.2-3 GHz, 40W TDP) and Opteron 4376 HE (4 modules, 89 cores at 2.6-3.6 GHz, 65W TDP) are about the best AMD can deliver for low power servers that need some more processing power." Unless I'm mistaken (which an 89 core chip would be pretty sweet, especially at just 65 watts) that should read 8 core. Otherwise great read Johan.

Log in

Don't have an account? Sign up now