The Mac Pro (Late 2013)

When Anand reviewed the Mac Pro late last year, he received the full fat 12 core edition, using the E5-2697 v2 CPU with a 2.7 GHz rating. The CPU choices for the Mac Pro include 4, 6 and 8 core models, all with HyperThreading. Interestingly enough, the 4/6/8 core models all come from the E5-16xx line, meaning the CPUs are designed with single processor systems in mind. But to get to the 12 core/24 thread model at the high end, Apple used the E5-2697 v2, a processor optimized for dual CPU situations. Based on the die shots on the previous page, this has repercussions, but as Anand pointed out, it all comes down to power usage and turbo performance.

Mac Pro (Late 2013) CPU Options
Intel CPU E5-1620 v2 E5-1650 v2 E5-1680 v2 E5-2697 v2
Cores / Threads 4 / 8 6 / 12 8 / 16 12 / 24
CPU Base Clock 3.7GHz 3.5GHz 3.0GHz 2.7GHz
Max Turbo (1C) 3.9GHz 3.9GHz 3.9GHz 3.5GHz
L3 Cache 10MB 12MB 25MB 30MB
TDP 130W 130W 130W 130W
Intel SRP $294 $583 ? $2614

The Mac Pro is designed within a peak 450W envelope, and Intel has options with its CPUs. For the same TDP limit, Intel can create many cores as low frequency, or fewer cores at higher frequency. This is seen in the options on the Mac Pro – all the CPU choices have the same 130W TDP, but the CPU base clocks change as we rise up the core count. Moving from 4 cores to 8 cores keeps the maximum turbo (single core performance) at 3.9 GHz, but the base clock decreases the more cores are available. Finally at the 12-core model, the base frequency is at its lowest of the set, as well as the maximum turbo.

This has repercussions on workloads, especially for workstations. For the most part, the types of applications used on workstations are highly professional, and have big budgets with plenty of engineers designed to extract performance. That should bode well for the systems with more cores, despite the frequency per core being lower. However, it is not always that simple – the mathematics for the problem has to be able to take advantage of parallel computing. Simple programs run solely on one core because that is the easiest to develop, but if the mathematics wholly linear, then even enterprise software is restricted. This would lend a positive note to the higher turbo frequency CPUs. Intel attempts to keep the turbo frequency similar as long as it can while retaining the maximum TDP to avoid this issue; however at the 12-core model this is not possible. Quantifying your workload before making a purchase is a key area that users have to consider.

Benchmark Configuration

I talk about the Mac Pro a little because the processors we have for a ‘regular’ test today are 8-core and 12-core models. The 12-core is the same model that Anand tested in the Mac Pro – the Xeon E5-2697v2. The 8-core model we are testing today is different to the one offered in the Mac Pro, in terms of frequency and TDP:

Intel SKU Comparison
  Core i7-4960X Xeon E5-2687W v2 Xeon E5-2697 v2
Release Date September 10, 2013 September 10, 2013 September 10, 2013
Cores 6 8 12
Threads 12 16 24
Base Frequency 3600 3400 2700
Turbo Frequency 4000 4000 3500
L3 Cache 15 MB 25 MB 30 MB
Max TDP 130 W 150 W 130 W
Max Memory Size 64 GB 256 GB 768 GB
Memory Channels 4 4 4
Memory Frequency DDR3-1866 DDR3-1866 DDR3-1866
PCIe Revision 3.0 3.0 3.0
PCIe Lanes 40 40 40
Multi-Processor 1P 2P 2P
VT-x Yes Yes Yes
VT-d Yes Yes Yes
vPro No Yes Yes
Memory Bandwidth 59.7 GB/s 59.7 GB/s 59.7 GB/s
Price $1059 $2112 $2618

The reason for this review is to put these enterprise class processors through the normal (rather than server) benchmarks I run at AnandTech for processors. Before I started writing about technology, as an enthusiast, it was always interesting to hear of the faster Xeons and how much that actually made a difference to my normal computing. I luckily have that opportunity and would like to share it with our readers.

The system set up is as follows:

Test Setup
Motherboards GIGABYTE GA-6PXSV3
MSI X79A-GD45 Plus for 3x GPU Configurations
Memory 8x4 GB Kingston DDR3-1600 11-11-11 ECC
Storage OCZ Vertex 3 256 GB
Power Supply OCZ 1250 ZX Series
CPU Cooler Corsair H80i
NVIDIA GPU MSI GTX 770 Lightning 2GB
AMD GPU ASUS HD 7970 3GB

Many thanks to...

We must thank the following companies for kindly providing hardware for our test bed:

Thank you to GIGABYTE Server for providing us with the Motherboard and CPUs
Thank you to OCZ for providing us with 1250W Gold Power Supplies and SSDs.
Thank you to Kingston for the ECC Memory kit
Thank you to ASUS for providing us with the AMD HD7970 GPUs and some IO Testing kit.
Thank you to MSI for providing us with the NVIDIA GTX 770 Lightning GPUs.

Power Consumption

Power consumption was tested on the system as a whole with a wall meter connected to the OCZ 1250W power supply, while in a single MSI GTX 770 Lightning GPU configuration. This power supply is Gold rated, and as I am in the UK on a 230-240 V supply, leads to ~75% efficiency > 50W, and 90%+ efficiency at 250W, which is suitable for both idle and multi-GPU loading. This method of power reading allows us to compare the power management of the UEFI and the board to supply components with power under load, and includes typical PSU losses due to efficiency. These are the real world values that consumers may expect from a typical system (minus the monitor) using this motherboard.

While this method for power measurement may not be ideal, and you feel these numbers are not representative due to the high wattage power supply being used (we use the same PSU to remain consistent over a series of reviews, and the fact that some boards on our test bed get tested with three or four high powered GPUs), the important point to take away is the relationship between the numbers. These boards are all under the same conditions, and thus the differences between them should be easy to spot.

Power Consumption - Idle

Power Consumption - Long Idle

Power Consumption - OCCT

At idle, the Xeons are on par with the Core i7-4960X for power consumption in the GIGABYTE motherboard. At load the extra TDP of the E5-2687W v2 can be seen.

DPC Latency

Deferred Procedure Call latency is a way in which Windows handles interrupt servicing. In order to wait for a processor to acknowledge the request, the system will queue all interrupt requests by priority. Critical interrupts will be handled as soon as possible, whereas lesser priority requests, such as audio, will be further down the line. So if the audio device requires data, it will have to wait until the request is processed before the buffer is filled. If the device drivers of higher priority components in a system are poorly implemented, this can cause delays in request scheduling and process time, resulting in an empty audio buffer – this leads to characteristic audible pauses, pops and clicks. Having a bigger buffer and correctly implemented system drivers obviously helps in this regard. The DPC latency checker measures how much time is processing DPCs from driver invocation – the lower the value will result in better audio transfer at smaller buffer sizes. Results are measured in microseconds and taken as the peak latency while cycling through a series of short HD videos - less than 500 microseconds usually gets the green light, but the lower the better.

DPC Latency Maximum

The DPC latency of the Xeons is closer to the 100 mark, which we saw during Sandy Bridge.  Newer systems seem to be increasing the DPC latency - so far all Haswell consumer CPUs are at the 140+ line.

Intel Xeon E5-2697 v2 and Xeon E5-2687W v2 Review: 12 and 8 Cores Real World CPU Benchmarks: Rendering, Compression, Video Conversion
POST A COMMENT

71 Comments

View All Comments

  • vLsL2VnDmWjoTByaVLxb - Monday, March 17, 2014 - link

    > TrueCrypt is an off the shelf open source encoding tool for files and folders.

    Encoding?
    Reply
  • Brutalizer - Monday, March 17, 2014 - link

    I would not say these cpus are for high end market. High end market are huge servers, with as many as 32 sockets, some monster servers even have 64 sockets! These expensive Unix RISC servers or IBM Mainframes, have extremely good RAS. For instance, some Mainframes do every calculation in three cpus, and if one fails it will automatically shut down. Some SPARC cpus can replay instructions if something went wrong. Hotswap cpus, and hotswap RAM. etc etc. These low end Xeon cpus have nothing of that.

    PS. Remember that I distinguish between a SMP server (which is a single huge server) which might have 32/64 sockets and are hugely expensive. For instance, the IBM P595 32-socket POWER6 server used for the old TPC-C record, costed $35 million. No typo. One single huge 32 socket server, costed $35 million. Other examples are IBM P795, Oracle M5-32 - both have 32 sockets. Oracle have 32TB RAM which is the largest RAM server on the market. IBM Mainframes also belong to this category.

    In contrast to this, every server larger than 32/64 sockets, is a cluster. For instance the SGI Altix or UV2000 servers, which sports up to 262.000 cores and 100s of TB. These are the characteristica of supercomputer clusters. These huge clusters are dirt cheap, and you pay essentially the hardware cost. Buy 100 nodes, and you pay 100 x $one node. Many small computers in a cluster.

    Clusters are only used for HPC number crunching. SMP servers (one single huge server, extremely expensive because it is very difficult to scale beyond 16 sockets) are used for ERP business systems. An HPC cluster can not run business systems, as SGI explains in this link:
    http://www.realworldtech.com/sgi-interview/6/
    "The success of Altix systems in the high performance computing market are a very positive sign for both Linux and Itanium. Clearly, the popularity of large processor count Altix systems dispels any notions of whether Linux is a scalable OS for scientific applications. Linux is quite popular for HPC and will continue to remain so in the future,...However, scientific applications (HPC) have very different operating characteristics from commercial applications (SMP). Typically, much of the work in scientific code is done inside loops, whereas commercial applications, such as database or ERP software are far more branch intensive. This makes the memory hierarchy more important, particularly the latency to main memory. Whether Linux can scale well with a SMP workload is an open question. However, there is no doubt that with each passing month, the scalability in such environments will improve. Unfortunately, SGI has no plans to move into this SMP market, at this point in time."
    Reply
  • Kevin G - Tuesday, March 18, 2014 - link

    @Brutalizer
    And here we go again. ( http://anandtech.com/comments/7757/quad-ivy-brigde... )

    “These low end Xeon cpus have nothing of that.”

    This is actually accurate as the E5 is Intel’s midrange Xeon series. Intel has the E7 line for those who want more RAS or scalability to 8 sockets. Features like memory hot swap can or lock step mirroring can be found in select high end Xeon systems. If you want ultra high end RAS, you can find it if you need it as well as pay the premium price premium for it.

    “In contrast to this, every server larger than 32/64 sockets, is a cluster. For instance the SGI Altix or UV2000 servers, which sports up to 262.000 cores and 100s of TB. These are the characteristica of supercomputer clusters. These huge clusters are dirt cheap, and you pay essentially the hardware cost. Buy 100 nodes, and you pay 100 x $one node.”

    Incorrect on several points but they’ve already been pointed out to you. The UV2000 is fully cache coherent (with up to 64 TB of memory) with a global address space that operates as one uniform, logical system that only a single OS/Hypervisor is necessary to boot and run.

    Secondly, the price of the UV2000 does not scale linearly. There are NUMALink switches that bridge the coherency domains that have to be purchased to scale to higher node counts. This is expected of how the architecture scales and is similar to other large scale systems from IBM and Oracle.

    “Clusters are only used for HPC number crunching.”

    Incorrect. Clustering is standard in what you define as SMP applications (big business ERP). It is utilized to increase RAS and prevent downtime. This is standard procedure in this market.

    “SMP servers (one single huge server, extremely expensive because it is very difficult to scale beyond 16 sockets) are used for ERP business systems. An HPC cluster can not run business systems,”

    Why? As long as underlaying architecture is the same, they can run. You may not get the same RAS or scale as high in a single logical system but they’ll work. Performance is where you’d expected it on these boxes: a dual socket HPC system will perform roughly one quarter the speed of as the same chips occupying an 8 socket system.

    “as SGI explains in this link:
    http://www.realworldtech.com/sgi-interview/6/

    As pointed out numerous times before, that link is you cite is a decade old. SGI has moved into the SMP space with the Altix UV series. Continuing to use this link as relevant is plain disingenuous and deceptive.

    As for an example of a big ERP application running on such an architecture, the US Post Office run’s Oracle Data Warehousing software on a UV1000. ( https://www.fbo.gov/index?s=opportunity&mode=f... )
    Reply
  • Brutalizer - Tuesday, March 18, 2014 - link

    Do you really think that UV (which is the successor to Altix) is that different? Windows is Windows, and it will not magically challenge Unix or OpenVMS, in some iterations later. Windows will not be superior to Unix after some development. You think that HPC- Altix will after some development, be superior to Oracle and IBM's huge investments of decades research in billions of USD? Do you think Oracle and IBM has stopped developing their largest servers?

    Altix is only for HPC number crunching, says SGI in my link. Today the UV line of servers, has up to 262.000 cores and 100s of TB of RAM. Whereas the largest Unix and IBM Mainframes have 64 sockets and couple of TB RAM, after decades of research.

    In a SMP server, all cpus will have to be connected to each other, for this SGI UV2000 with 32.768 cpus, you would need (n²) 540 million (half a billion) threads connecting each cpu. Do you really think that is feasible? Does it sound reasonable to you? IBM and Oracle and HP has had great problems connecting 32 sockets to each other, just look at the connections on the last picture at the bottom, do you see all connections? Now imagine half a billion of them in a server!
    http://www.theregister.co.uk/2013/08/28/oracle_spa...

    But on the other hand, if you keep the number of connection downs to islands, and then connect the islands to each other, you dont need half a billion. This solution would be feasible. And then you are not in SMP territory anymore: SGI say like this on page 4 about the UV2000 cluster:
    www.sgi.com/pdfs/4395.pdf‎
    "...SMP is based on intra-node communication using memory shared by all cores. A cluster is made up of SMP compute nodes but each node cannot communicate with each other so scaling is limited to a single compute node...."

    Dont you think that a 262.000 core server and 100s of TB of RAM sounds more like a cluster, than a single fat SMP server? And why do the UV line of servers focus on OpenMPI accerators? OpenMPI is never used in SMP workloads, only in HPC.

    Do you have any benchmarks where one 32.768 cpu SGI UV2000 demolishes 50-100 of the largest Oracle SPARC M6-32 in business systems? And why is the UV2000 much much cheaper than a 16/32 socket Unix server? Why does a single 32 socket Unix server cost $35 million, whereas a very large SGI cluster with 1000 of sockets is very very cheap?
    Reply
  • Kevin G - Tuesday, March 18, 2014 - link

    Wow, I think the script you're copy/pasting from needs better revision.

    "Do you really think that UV (which is the successor to Altix) is that different?"

    Yes. SGI changed the core achitecture to add cache coherent links between the entire system. Clusters tend to have an API on top of a networking software stack to abstract the independent systems so they may act as one. The UV line does not need to do this. For one processor to use memory and performance calculations on data residing off of CPU on the other end, a memory read operation is all that is needed on the UV. It is really that simple.

    "Windows is Windows, and it will not magically challenge Unix or OpenVMS, in some iterations later."

    The UV can run any OS that runs on modern x86 hardware today. Windows, Linux, Solaris (Unix) and perhaps at some point NonStop (HP's mainframe OS http://h17007.www1.hp.com/us/en/enterprise/servers... ). The x86 platform has plenty of choices to choose from.

    "You think that HPC- Altix will after some development, be superior to Oracle and IBM's huge investments of decades research in billions of USD? Do you think Oracle and IBM has stopped developing their largest servers?"

    What I see SGI offering is another tool alongside IBM and Oracle systems. Also you mention decades of research, then it is also fair to put SGI into that category as that link you love to spam IS A DECADE OLD. Clearly SGI didn't have this technology back in 2004 when that interview was written.

    "Today the UV line of servers, has up to 262.000 cores and 100s of TB of RAM. Whereas the largest Unix and IBM Mainframes have 64 sockets and couple of TB RAM, after decades of research."

    Actually this is a bit incorrect. IBM can scale to 131,072 cores on POWER7 if the coherency requirement is forgiven. Oh, and this system can run either AIX or Linux when maxed out. Source: http://www.theregister.co.uk/Print/2009/11/27/ibm_...

    "In a SMP server, all cpus will have to be connected to each other, for this SGI UV2000 with 32.768 cpus, you would need (n²) 540 million (half a billion) threads connecting each cpu.
    http://www.theregister.co.uk/2013/08/28/oracle_spa...

    Wow, do you not read your own sources? Not only is your math horribly horribly wrong but the correct methodology is found for calculating the number of links as things scale is in the link you provided. To quote that link: "The Bixby interconnect does not establish everything-to-everything links at a socket level, so as you build progressively larger machines, it can take multiple hops to get from one node to another in the system. (This is no different than the NUMAlink 6 interconnect from Silicon Graphics, which implements a shared memory space using Xeon E5 chips...)"

    The full implication here is that if the UV 2000 is not a socket machine, then neither is Oracle's soon-to-be-released 96 socket device. The topology to scale is the same in both cases per your very own source.

    "SGI say like this on page 4 about the UV2000 cluster:
    www.sgi.com/pdfs/4395.pdf‎"

    Fundamentally false. If you were to actually *read* the source material for that quote, it is not describing the UV2000. Rather is speaking generically abou the differences between a cluster and large SMP box on page 4. If you got to page 19, it further describes the UV 2000 as a single system image unlike that of a cluster as defined on page 4.

    "Dont you think that a 262.000 core server and 100s of TB of RAM sounds more like a cluster, than a single fat SMP server? And why do the UV line of servers focus on OpenMPI accerators? OpenMPI is never used in SMP workloads, only in HPC."

    All I'd say about a 262,000 core server is that it wouldn't fit into a single box. Then again IBM, Oracle and HP are spreading their large servers across multiple chassis so this doesn't bother me at all. The important part is how all these boxes are connected. SGI uses NUMAlink6 which provides cache coherency and a global address space for a single system image. OpenMPI can be used inside of a cache coherent NUMA system as it provides a means to gurantee memory locality when data is used for execution. It is a means of increasing efficiency for applications that use it. However, OpenMPI libraries do not need to be installed for software to scale across all 256 sockets on the UV200. It is purely an option for programmers to take advantage of.

    "And why is the UV2000 much much cheaper than a 16/32 socket Unix server? Why does a single 32 socket Unix server cost $35 million, whereas a very large SGI cluster with 1000 of sockets is very very cheap?"

    First, to maintain coherency, the UV2000 only scales to 256 sockets/64 TB of memory. Second, the cost of a decked out P795 from IBM in terms of processors (8 sockets, 256 cores) and memory (2 TB) but only basic storage to boot the system is only $6.7 million whole sale. Still expensive but far less than what you're quoting. It'll require some math and reading comprehension to get to that figure but here is the source: http://www-01.ibm.com/common/ssi/ShowDoc.wss?docUR...

    I couldn't find pricing for the UV2000 as a complete system but purchasing the Intel processors and memory seperately to get to a 256 socket/64 TB system would be just under $2 million. Note that that figure is just processor + memory, no blade chassis, racks or interconnect to glue everything together. That would also be several million. So yes, the UV2000 does come out to be cheaper but not drastically. That IBM pricing document does highlight why their high end systems costs so much, mainly capacity on demand. The p795 is getting a mainframe like pricing structure where you purchase the hardware and then you have to activate it as an additional cost. Not so on the UV2000.
    Reply
  • psyq321 - Tuesday, March 18, 2014 - link

    Xeon 2697 v2 is not a "low end" Xeon.

    It is part of "expandable server" platform (EP), being able to scale up to 24 cores.

    That is far from "low end", at least in 2014.
    Reply
  • alpha754293 - Wednesday, March 19, 2014 - link

    "High end market are huge servers, with as many as 32 sockets, some monster servers even have 64 sockets!"

    Partially true. The entire cabinet might have that many sockets/processors, but on a per-system, per-"box" level, most max out between two and four. You get a few odd balls here and there that would have a daughter board for a true 8-socket system, but those are EXTREMELY rare in actuality. (Tyan, I think had one for the AMD Opterons, and they said that less than 5% of the orders were for the full fledge 8-socket systems).

    "PS. Remember that I distinguish between a SMP server (which is a single huge server) which might have 32/64 sockets and are hugely expensive. For instance, the IBM P595 32-socket POWER6 server used for the old TPC-C record, costed $35 million. No typo. One single huge 32 socket server, costed $35 million. Other examples are IBM P795, Oracle M5-32 - both have 32 sockets. Oracle have 32TB RAM which is the largest RAM server on the market. IBM Mainframes also belong to this category."
    Again, only partially true. The costs and stuff is correct, but the assumptions that you're writing about is incorrect. SMP is symmetric multiprocessing. BY DEFINITION, that means that "involves a multiprocessor computer hardware and software architecture where two or more identical processors connect to a single, shared main memory, have full access to all I/O devices, and are controlled by a single OS instance that treats all processors equally, reserving none for special purposes." (source: wiki) That means that it is a monolithic system, again, of which, few are TRULY such systems. If you've ever ACTUALLY witnessed the startup/bootup sequence of an ACTUAL IBM mainframe, the rest of the "nodes" are actually booted up typically by PXE or something very similiar to that, and then the "node" is ennumerated into the resource pool. But, for all other intents and purposes, they are semi-independent, standalone systems, because SMP systems do NOT have the capability to pass messages and/or memory calls (reads/writes/requests) without some kind of a transport layer (for example MPI).

    Furthermore, the old TPC-C that you mention, they do NOT process as one monolithic sequential series of events in parallel (so think of like how PATA works...), but rather more like a JBOD SATA (i.e. the processing of the next transaction does NOT depend on ALL of the current block of transactions to be completed, UNLESS there is an inherent dependency issue, which I don't think would be very common in TPC-C). Like bank accounts, they're all treated as discrete and separate, independent entities, which means you can send all 150,000 accounts against the 32-socket or 64-socket system and it'll just pick up the next account when the current one is done, regardless.

    The other failure in your statement or assumption is that's why there's something called HA - high avialability. Which means that they can dynamically hotswap an entire node if there's a CPU failure, so that the node can be downed and yanked out for service/repair while another one is hotswapped in. So it will failover to a spare hotswap node, work on it, and then either fall over back to the newly replaced node or it would rotate the new one into the hotswap failover pool. (There are MANY different ways of doing that and MANY different topologies).

    The statement you made about having 32TB of RAM is again, partially true. But NONE of the single OS instances EVER have full control of all 32TB at once, which again, by DEFINITION, means that it is NOT truly SMP. (Course, if you ever get a screenshot which shows that, I'd LOVE to see it. I'd LOVE to get corrected on that.)

    "In contrast to this, every server larger than 32/64 sockets, is a cluster."
    Again, not entirely true. You can actually get 4 socket systems that are comprised of two dual-socket nodes and THAT is enough to meet the requirements of a cluster. Heck, if you pair up two single-socket consumer-grade systems, that TOO is a cluster. That's kinda how Beowulf clusters got started - cuz it was an inexpensive way (compare to the aforementioned RISC UNIX based systems) to gain computing power without having to spend a lot of money.

    'These huge clusters are dirt cheap"
    Sure...if you consider IBM's $100 million contract award "cheap".

    "Clusters are only used for HPC number crunching. SMP servers (one single huge server, extremely expensive because it is very difficult to scale beyond 16 sockets) are used for ERP business systems. An HPC cluster can not run business systems, as SGI explains in this link:"
    So there's the two problems with this - 1) it's SGI - so of course they're going to promote what they ARE capable of vs. what they don't WANT to be capable of. 2) Given the SGI-biased statements, this, again, isn't EXACTLY ENTIRELY true either.

    HPCs CAN run ERP systems.

    "HPC vendors are increasingly targeting commercial markets, whereas commercial vendors, such as Oracle, SAP and SAS, are seeing HPC requirements." (Source: http://www.information-age.com/it-management/strat...

    But that also depends on the specific implementation of the ERP system given that SAP is NOT the ONLY ERP system that's available out there, but it's probably one of the most popular one, if not THE most popular one. (There's a whole thing about distributed relational databases so that the database can reside in smaller chunks across multiple nodes, in-memory, which are then accessed via a high speed interconnect like Myrinet or Infiniband or something along those lines.)

    Furthermore, the fact that ERP runs across large mainframes (it grows as the needs grows), is an indications of HPC's place in ERP. Alternatively, perhaps rather than using it for the backend, HPC can be used on the front end by supporting many, many, many virtualized front-end clients.

    Like I said, most of the numbers that you wrote are true, but the assumptions behind them isn't exactly all entirely true.

    See also: http://csserver.evansville.edu/~mr56/Publications/...
    Reply
  • Kevin G - Wednesday, March 19, 2014 - link

    "That means that it is a monolithic system, again, of which, few are TRULY such systems. If you've ever ACTUALLY witnessed the startup/bootup sequence of an ACTUAL IBM mainframe, the rest of the "nodes" are actually booted up typically by PXE or something very similiar to that, and then the "node" is ennumerated into the resource pool. But, for all other intents and purposes, they are semi-independent, standalone systems, because SMP systems do NOT have the capability to pass messages and/or memory calls (reads/writes/requests) without some kind of a transport layer (for example MPI)."

    Not exactly. IBM's recent boxes don't boot themselves. Each box has a service processor that initializes the main CPU's and determines if there are any additional boxes connected via external GX links. If it finds external boxes, some negotiation is done to join them into one large coherent system before an attempt to load an OS is made. This is all done in hardware/firmware. Adding/removing these boxes can be done but there are rules to follow to prevent data loss.

    It'll be interesting to see what IBM does with their next generation of hardware as the GX bux is disappearing.

    "The statement you made about having 32TB of RAM is again, partially true. But NONE of the single OS instances EVER have full control of all 32TB at once, which again, by DEFINITION, means that it is NOT truly SMP. (Course, if you ever get a screenshot which shows that, I'd LOVE to see it. I'd LOVE to get corrected on that.)"

    Actually on some of these larger systems, a single OS can see the entire memory pool and span across all sockets. The SGI UV2000 and SPARC M6 are fully cache coherent across a global memory address space.

    As for a screenshot, I didn't find one. I did find a video going over some of the UV 2000 features displaying all of this though. It is only a 64 socket, 512 core, 1024 thread, 2 TB of RAM configuration running a single instance of Linux. :)
    https://www.youtube.com/watch?v=YUmBu6A2ykY

    IBM's topology is weird in that while a global memory address space is shared across nodes, it is not cache coherent. IBM's POWER7 and their recent BlueGene systems can be configured like this. I wouldn't call these setups clusters as there is no software overhead to read/write to remote memory addresses but it isn't fully SMP either due to multiple coherency domains.
    Reply
  • silverblue - Monday, March 17, 2014 - link

    The A10-7850K is a 2M/4T CPU. Reply
  • Ian Cutress - Monday, March 17, 2014 - link

    Thanks for the correction, small brain fart on my part when generating the graphs. Reply

Log in

Don't have an account? Sign up now