AMD’s EPYC Rome Chips Could Crash After Three Years
AMD’s EPYC 7002 Rome server processor could crash after 1,044 days of uptime (about three years). This was revealed in a revision guide released on April 2023, which states that a core will fail to shut down CC6 after approximately 1044 days from the last system reset. The failure time may vary depending on the spread spectrum and REFCLK frequency.
Tom’s Hardware reported the problem this week. It noted that the bug “could cause a core to hang on the chip after 1,044 days of uptime (~2.93 years), after which you need to reset the server for the chip to work properly.”
Solution to the Problem
The revision guide also states that AMD will not fix the problem. However, the solution is simple. Users must either reboot before reaching 1,044 days of uptime, which resets the CPU to restart the 1,044 days “timer,” or disable CC6 hibernation.
Background on AMD’s EPYC Rome Chips
AMD first introduced the Rome CPUs at the Next Horizon Event in November 2018. Based on the Zen 2 core architecture, the EPYC Rome chips are some of the most competitive chips AMD has introduced to the data center market.
Chip Errata is Common
“With billions of transistors at play, problems are inevitable,” notes Tom’s Hardware. “It is not uncommon for a chip to have a thousand or more errata/bugs that are corrected in newer versions of the chip or with firmware tweaks before launch.”
As a frame of reference, we should note that Intel’s 8th-gen has over 150 listed errata that still exist, and those chips were launched in 2017. “We don’t know how many errata the Rome chips have had because AMD has removed the lists for errata that have been fixed,” the article said. “However, we do know that there are 39 errata left, which actually doesn’t seem too bad against the backdrop of Intel.”
AMD recently introduced the second generation EPYC 7Fx2 processors. These processors are based on the Zen 3 core architecture and have been designed to provide improved performance and power efficiency. The new processors are expected to offer up to twice the performance of the first generation EPYC Rome chips.
AMD’s EPYC Rome chips are some of the most powerful processors available in the data center market. However, the timer bug could cause them to crash after 1,044 days of uptime. Fortunately, the solution is simple and users can either reboot before reaching 1,044 days of uptime or disable CC6 hibernation. Chip errata is common and Intel’s 8th-gen has over 150 listed errata that still exist. AMD has recently introduced the second generation EPYC 7Fx2 processors, which are based on the Zen 3 core architecture and are expected to offer up to twice the performance of the first generation EPYC Rome chips.