Today, Intel released its last updates to the Itanium family, the Itanium 9700 series. These new cores, codenamed Kittson, will be the last Itanium processors Intel manufacturers. Kittson is the first update to Poulson, which debuted a new Itanium architecture back in 2012, but it includes no additional features or capabilities — just higher clock speeds in some cases. We’ve aligned the chart from Intel’s CPU database (available at ark.intel.com) to show the upgrades to the higher-end eight-core CPUs, and this is it for the CPU The Register once nicknamed “Itanic.”
The highest-end eight-core Itanium CPU gets a 133MHz speed bump, while the lower-end eight-core doesn’t even get that — the 9740 is literally the same chip as the 9540, with the exact same clock speed. The quad-core models (not shown above) all pick up an additional 133MHz of clock speed, but nothing else.
It’s a bit of a sad end to what was once billed as Intel’s most exciting, forward-looking design. Back in the late 1990s, Itanium was pitched as a joint project between HP and Intel, one that would produce a true successor to the various RISC architectures that were still in-market at that time. Intel and HP both spent huge amounts of money and burned an enormous amount of development time trying to bring a new type of microprocessor to market, only to see Itanium crash and burn while x86 boomed. So what went wrong with Itanium?
A brief history
Itanium uses a computing model called EPIC (Explicitly Parallel Instruction Computing), which grew out of VLIW (Very Long Instruction Word). One of its primary goals was to move the work of deciding what workloads to execute from the CPU back to the compiler.
To understand why this was considered desirable, it helps to understand a bit of CPU history. In the 1970s and early 1980s, most CPUs were CISC (Complex Instruction Set Computing) designs. These designs ran at relatively low clock speed, but could perform highly complex operations in a single clock cycle. At the time, this made sense, since the cost of memory actually dominated the total cost of a computer and was extremely slow. The less memory your code required and the fewer memory accesses it depended on, the faster it could execute.
Beginning in the 1980s, a new type of architecture, RISC (Reduced Instruction Set Computing) began to become popular. RISC architecture reduced CPU design complexity, instituted simpler instruction sets and CPU designs, but clocked those designs at higher frequencies. Both MIPS and SPARC have their origins in early RISC research, for example, and all modern higher-end x86 chips decode x86 operations into RISC-style micro-ops for execution. Debates over whether RISC or CISC were ‘better’ somewhat misses the point; CISC made very good sense at one point in history, and RISC-like designs made good sense at another.
EPIC isn’t directly related to RISC, but the engineers at HP and Intel were hoping EPIC would be a huge leap above the x86 designs of the early and mid-1990s in the same way that RISC designs had proven themselves superior to the CISC architectures of the 1970s. The goal for EPIC was to push instruction scheduling, branch prediction, and parallelism extraction on to the compiler. In theory, this would free up die space for additional execution units, which could then be used to execute parallel workloads even more quickly. Instead of using hardware blocks on the CPU to organize workloads for out-of-order optimal execution, the compiler would handle that task. All the CPU has to do is execute the code it gets handed in the order and manner it was told to execute it.
The only problem was, it didn’t work well in practice. Memory accesses from cache and DRAM are non-deterministic, which meant the compiler can’t predict how long they will take — and if the compiler can’t predict how long they’ll take, it’s going to have a hard time scheduling workloads to fill the gap. Itanium was designed to present a huge array of execution resources that the compiler would intelligently fill, but if the compiler can’t keep them filled, it’s just wasted die space. Itanium was designed to extract and exploit instruction-level parallelism, but compilers of the time struggled to find enough ILP in most workloads to justify the expense and difficulty of running code on the platform. And EPIC was so radically different from any other architecture, there was no way to cleanly port an application. These days, we talk a lot about differences between x86 and ARM processors, but x86 and ARM are practically blood relatives to one another compared with EPIC and x86.
Of course, technical reasons aren’t the only reason why Itanium failed. The chips were expensive, difficult to manufacture, and years behind schedule. Intel made a high-profile declaration that Itanium was the future of computing and represented its only 64-bit platform. Then AMD announced its own AMD64 instruction set, which extended 64-bit computing to the x86 architecture. Intel didn’t change course immediately, but it eventually cross-licensed AMD64. This was a tacit admission that Itanium would never come to the desktop. Then, a few years ago, a high-profile court case between Oracle and HP put a nail in Itanium’s coffin. Today’s chips are the very definition of a contract fulfillment, with no improvements, cache tweaks, or other architectural boosts.
Most, I suspect, will shed no tears for Itanium, particularly given its impact on the development of other, more promising architectures like PA-RISC and Alpha. But its failure is also a testament to how some of the ‘facts’ that get passed around CPU industry aren’t as simple as we might think. x86 is often discussed in disparaging terms as an outdated and ancient architecture, as if no one had the guts to take it outside and shoot it. But in reality, Intel made multiple attempts to do just that, from the iAPX 432 (begun in 1975) to the i860 and i960, to Itanium itself. And despite an emphasis on parallelism that sounds superficially promising, given the difficulty modern programmers have had with scaling applications to use multiple threads in an effective manner, Itanium represented another dead-end branch of research that never managed to deliver the real-world performance it promised on paper.
Now read: How L1 and L2 CPU caches work, and why they’re an essential part of modern chips