AMD has officially launched its new Epyc server CPU family in a bid to take the fight directly to the top of Intel’s highly profitable Xeon product lines. These chips are built on the same fundamental architecture as the company’s Ryzen CPU cores, and they’re aimed at the incredibly powerful data center market.
AMD’s 32-core / 64-thread Epyc CPUs combine four eight-core dies, each connected to the other via the company’s Infinity Fabric. According to AMD, this approach is significantly cheaper than trying to pack 32 cores into a single monolithic die — that approach would leave the company potentially throwing away huge amounts of silicon during its production ramp. The Infinity Fabric is deliberately over-provisioned to minimize any problems with non-NUMA aware software, according to Anandtech (some of you may remember NUMA, but we’ll discuss it again shortly).
Each 32-core Epyc CPU will support eight memory channels and two DIMMs per channel, for a total maximum memory capacity of 2TB per socket, or 4TB of RAM in a two-socket system. Each CPU will also offer 128 lanes of PCI Express 3.0 support — enough to connect up to six GPUs at x16 each with room left over for I/O support. That’s in a one-socket system, mind you. In a two-socket system, the total number of available PCI Express 3.0 lanes is unchanged, at 128 (64 PCIe 3.0 lanes are used to handle CPU – CPU communication).
AMD is launching a suite of single-socket and dual-socket parts at a range of price points and capabilities. At the very top of the stack there’s the Epyc 7601, a 32-core / 64-thread chip with a 2.2GHz base clock, 2.7GHz top frequency on all cores, and maximum frequency of 3.2GHz with a 180W TDP. At the bottom of the dual-socket stack there’s the Epyc 7251, an eight-core chip with a 2.1GHz base / 2.9GHz max frequency and a 120W TDP. That’s substantially more TDP than some of AMD’s eight-core consumer chips, but there’s a straightforward explanation: Even the Epyc 7251 has a 64MB L3, eight memory channels, and 128 PCI Express lanes. Stack those capabilities up, and you’ve got a chip with a much higher TDP, despite lower clocks than what you’d find on the Ryzen 7 1700.
AMD is also making some aggressive performance claims around its upcoming chips, including arguing that it can make hash out of Intel’s current Xeon lineups across both one-socket and two-socket configurations as shown below:
One of the potential challenges of AMD’s approach to Epyc is that it relies on NUMA to handle communication across microprocessors. NUMA (Non-Uniform Memory Access) means that a CPU can access data that isn’t local to its own memory pool, albeit at higher latencies. This is in contrast to Unified Memory Access (UMA), which maintains a constant memory access latency, but scales poorly as more cores are added to a system.
The idea behind NUMA is that data is distributed to the CPUs that require it. The memory bandwidth improvements from properly NUMA-aware software can be substantial, though taking advantage of the feature does require software support. Intra-processor bandwidth between each CPU cluster on an Epyc CPU is 42.6GB/s, while inter-CPU bandwidth between two different sockets is only slightly lower, at 37.9GB/s.
Anandtech has a longer writeup with more details on the CPUs power efficiency and TDP scaling, so check there if you want a more in-depth overview. From where I sit, however, Epyc is easily the most competitive server part AMD has fielded in over half a decade. Interlagos (Bulldozer) simply never had the performance to make AMD competitive and it chewed through too much power to give them much room in scaling.
Epyc appears poised to change all that. Obviously we won’t know the specifics of how it compares against Intel’s solutions until independent server tests are available, but AMD knows how important this chip is to its long-term future. The company plans to ramp up production steadily, but it’s already said that it won’t rush the BD rollout. From our perspective, that’s a good thing. The server market tends to be more conservative than its desktop counterpart, and it’s much more important that AMD hit everything right the first time, than getting a bunch of initial press and ultimately few design wins.