AMD Zen 5 Architecture Deep Dive from Tech Day

1 | NEXT GENERATION “ZEN 5” CORE
Embargoed until July 24th at 9 AM ET.

Next Generation
“Zen 5” Core
Mike Clark and Mahesh Subramony
AMD Tech Day Update

CAUTIONARY STATEMENT
This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) such as the features,
functionality, performance, availability, timing and expected benefits of AMD’s current products, future products and product
roadmaps, which are made pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward-
looking statements are commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends," "projects"
and other terms with similar meaning. Investors are cautioned that the forward-looking statements in this presentation are based on
current beliefs, assumptions and expectations, speak only as of the date of this presentation and involve risks and uncertainties that
could cause actual results to differ materially from current expectations. Such statements are subject to certain known and unknown
risks and uncertainties, many of which are difficult to predict and generally beyond AMD's control, that could cause actual results and
other future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and
statements. Investors are urged to review in detail the risks and uncertainties in AMD’s Securities and Exchange Commission filings,
including but not limited to AMD’s most recent reports on Forms 10-K and 10-Q.
AMD does not assume, and hereby disclaims, any obligation to update forward-looking statements made in this presentation, except
as may be required by law.

Performance
▪ Deliver another major 1T and 2T performance increase
▪ New Foundation of Compute for the Future
▪ AVX512 on 512-bit datapath to increase throughput and AI uplift
Platform Support
▪ Deliver “Zen 5” and “Zen 5c” (compact) core variants
▪ Support configurable FP512/FP256 datapath
▪ Support scaling and energy efficiency
▪ Support 4 and 3 nm
▪ Enhanced ISA capabilities
Design Objectives

NextGen Branch Predictor
Caches
▪ I-Cache: 32KB, 8-way; 2x 32B fetch/cycle
▪ Op-Cache: 6K inst; 2x 6-wide fetch/cycle
▪ D-Cache: 48KB, 12-way; 4 mem ops/cycle
▪ L2-Cache: 1MB, 16-way
Dual I-Fetch/decode pipes, 4 inst/pipe
8 ops/cycle dispatched to Integer or FP
Execution capabilities
▪ 6 integer ALU
▪ 4 AGU, 4 addresses to LS per cycle
▪ 6 FP ops/cycle; 2-cycle FADD
▪ Full 512b AVX512 datapaths
Dataflow
▪ 4 load pipes capable of 2, 512b AVX512 loads
▪ 2x width L2 cache <-> L1I and L1D caches
2 Threads per core
“Zen5” MicroarchitectureOverview

Instruction Fetch Advances
Branchprediction
▪ Zero-bubble conditional branches
▪ L2-sized (16K) L1 BTB and larger TAGE
▪ Larger return address stack (52entry)
▪ 2 taken predictions/cycle
▪ Up to 3 prediction windows/cycle
Memory Management
▪ Aggressive Fetch hides L2 & tablewalk
latency
▪ 2048 entry L2 ITLB
Icachelatency andbandwidth
▪ 64B/cycle fetch
▪ 2 instruction fetch streams
OptimizedBranchPrediction andFetch

Instruction DecodeAdvances
OpCacheStorage
▪ 33% more entry associativity (16-way)
▪ Dense entries store 6 instructions(fused)
▪ 2 OC pipes x 6 inst/pipe =>12 inst/cycle
DualDecodePipes
▪ 2 pipes support parallel independent
instruction streams
▪ 4 inst/cycle throughput per pipe
▪ SMT mode gives each thread a pipe
8-widedispatchtoIntandFP
NewDecodeAdvances

Integer ExecutionAdvances
8-wide dispatch, rename, retire
Integerscheduleradvances
▪ Unified with age matrix
▪ More symmetry, simplifying pick
6ALUs with3multipliers,3branch
units
4AGUs feedawiderLSwith4memory
addresses percycle
Executionwindowgrowth
▪ Scheduler growth 88 ALU/56 AGU
▪ 240 entry physical register file
▪ ROB 448 entries
WiderDispatch andExecute

Load/StoreAdvances
48KB12-wayL1Dkeeping4-cycleload-to-use
MoreBandwidth
▪ 4 LS pipes for a mix of 4 loads/2 stores per cycle
▪ 4 Integer load pipes can pair into 2, FP Pipes
▪ 2 store commit per cycle
▪ 64B fill/victim from/to L2 Dcache
TLBs
▪ L1: 96entry Fully associative all page size DTLB
▪ L2: 4K DTLB everything but 1G
LargerIn-FlightWindow
▪ Load and store queue growth
▪ Store coalescing buffer growth
▪ Scalable load ordering queue
Dataprefetching
▪New 2D stride prefetcher also improves stream
and region prefetchers
▪Extends workload pattern recognition
IncreasedDataBandwidth

FP/Vector Math Unit Execution Advances
FPmajorfeatures/changes
▪ AVX512 with full 512b datapath
Morebandwidth,less latency
▪ 4 execution pipelines
▪ 2 LS/integer register pipelines
▪ 2 512b loads/cycle, 1 512b store/cycle
▪ 2-cycle FADD
Executionwindowgrowth
▪ NSQ growth with 8-wide dispatch
▪ 3 larger schedulers: 1/pipe pair
▪ Physical register file doubles
▪ ROB/retire queue growth
IncreasedFPCapability

IPC
Frequency
Area
Power efficiency
"Zen 5c"
"Zen 5"
"Zen 5” and “Zen 5c” in Heterogeneous SOC
“Zen 5” and “Zen 5c” in separate core complexes
“Zen 5” Optimized for maximum 1T performance
▪ High max frequency Target
▪ Large L3 per core
“Zen 5c” Optimized for scalability
▪ Same IPC and features
▪ Lower max frequency
▪ Increased power efficiency
▪ Lower L3 per core
Simplifies software scheduling
▪ Same IPC means no unique bottlenecks like vector
performance
▪ Both support SMT
▪ Modulate between ultimate performance vs.
efficiency
▪ Scheduling “mistakes” minimized over time

Attribute ISA Feature
Instructions
▪ MOVDIRI/MOVD64B – move 4,8 or 64 bytes as a direct store, bypassing caches
▪ VP2INTERSECT[DQ] – AVX512 vector pair intersection to a pair of mask registers
▪ VNNI/VEX – extends AVX512 instruction to VEX encoding
▪ PREFETCH[I*] – software prefetch of instruction lines into cache hierarchy
Kernel/Virtualization/QoS
▪ PMC virtualization – provides security for a guest vs. hypervisor; isolates PMC/guest
▪ Heterogeneous Topology
New "Zen 5" ISA

Key “Zen 5” vs. “Zen 4” Capabilities
IC/BP
DE/OC/DI
EX/SC
LS/DC/L2
Attribute Zen 4 Zen 5
L1/L2 BTB 1.5K/7K 16K/8K
Return Address Stack 32 52
ITLB L1/L2 64/512 64/2048
Fetched/Decoded Instruction Bytes/cycle 32 64
Op Cache associativity 12-way 16-way
Op Cache bandwidth 9macro-ops 12 inst or fused inst
Dispatch bandwidth (macro-ops/cycle) 6 8
AGU Scheduler 3x24 ALU/AGU 56
ALU Scheduler 1x24 ALU 88
ALU/AGU 4/3 6/4
Int PRF (reg/flag) 224/126 240/192
Vector Reg 192 384
FP Pre-Sched Queue 64 96
FP Scheduler 2x32 3x38
FP Pipes 3 4
Vector Width 256b 256b/512b
ROB/Retire Queue 320 448
LS Mem Pipes support Load/Store 3/1 4/2
DTLB L1/L2 72/3072 96/4096
L1Data Cache 32KB/8-way 48KB/12-way
L2 per core 1MB/8w 1MB/16w
L2 bandwidth 32B/clk 64B/clk
“Zen 5”
Uplift Breakdown

“Zen 5” Core Complex speeds and feeds
32B/cycle
4 Load (Max 2 for 512b)
2 Store (Max 1 for 512b)
Core 1
…
Core 7
1MB L2
I+D
Cache
16-way
32B/cycle
64B/cycle
64B/cycle
32K
I Cache
8-way
48K
D Cache
12-way
2 x 32B fetch
Core
0
• Double the L2 associativity
• Double the L2 Bandwidth
• Low latency L3 with 320 L3 in-flight
misses
• Baseline from “Zen 4”:
• Fast private 1MB L2 cache
• L3 shared among all cores in the complex
• L3 is filled from L2 victims
• L2 tags duplicated in L3 for probe filtering
and fast cache transfer
32MB L3
I+D
Cache
16-way

Zen5 Core Complexes across SOCs
8-Classic
32MB L3
4-Classic
16MB L3
8-Compact
8MB L3
N-Classic/Compact
X-MB L3
• “Strix Point”
• Heterogenous Architecture
• Dual CCXs
• 4-Classic -16MB L3, 8-Compact – 8MB L3
• “Granite Ridge”
• Homogenous Architecture
• Upto Dual CCDs
• 8-Classic – 32MB L3
• Futures
• Smaller, Larger CCXs
• Homogenous or Heterogenous
• Data Center to Embedded
N-Classic/Compact
X-MB L3
N-Classic/Compact
X-MB L3

• CPU
• 4C8T Zen5 – 1MB L2/core, 16MB L3 CCX
• 8C16T Zen5c – 1MB L2/core, 8MB L3 CCX
• Datapath – 32B/cycle port each
• GPU
• 8 WGP (16 CU) RDNA 3.5
• Datapath – 4 x 32B/cycle ports
• NPU
• 4 x 8 Array XDNA 2 Inference Engine
• Datapath – 32B/cycle
• Accelerators / uControllers
• Video Encode/Decode
• Audio Co-processor
• Display Controller
• System Management, Security, Wireless Manageability
• IO
• 128b LPDDR5/DDR5 (7500/5600 MT/s)
• 16L PCIe Gen4
• 4 Simultaneous display streams
• 8 USB ports
• 2 USB4 v1
• 1 USB3 Type-C
• 2 USB3.2 Gen2
• 3 USB2
• I2c, SPI/eSPI, GPIO
RDNA3.5
(Upto 16 Compute Units)
2MB L2 Cache
16 CU
4RB+
AMD Radeon 800M Graphics
Complete
System
Connectivity
Integrated
Sensor
Fusion
Hub
Accelerated
Multimedia
Experience
“Zen5” CCXs
X32
DDR5
/
LPDDR5
Infinity Fabric
CPU CORE
CPU CORE
CPU CORE
CPU CORE
CPU CORE
CPU CORE
CPU CORE
CPU CORE
16MB L3
Cache
PCIe 4.0
GPP
USB4®
USB 3.1
USB 2.0
USB-C®
(w/DP Mode)
NVMe
PCIe 4.0
SATA
Wireless
Managability
Subsystem
PCIe 4.0
Discrete
GFX
Sensor
Fusion
Hub
FCH
Multimedia
Engines
Display Controller
4 Display Support
DISPLAYPORT 2
HDMI® 2.1
System
Manage
ment
Unit
Video
Codec ACP
AMD
Platform
Security
Processor
Microsoft
Pluton
Processor
X32
DDR5
/
LPDDR5
X32
DDR5
/
LPDDR5
X32
DDR5
/
LPDDR5
XDNA2
Inference
Engine
CPU CORE
CPU CORE
CPU CORE
CPU CORE
8MB
L3
Cach
e
AMD “Strix Point” SOC
“Strix Point” SOC

AMD RDNA 3.5
Texture Subsystem
2x Sampler Rate, Point sampling acceleration
Shader Subsystem
2x Interpolation and Comparison rates
Floating point in SALU
Skip single-use VGPR writes
Rastor Subsystem
Sub-batching allows hardware to be efficient
Programmable bin order
Memory Subsystem Improvements
LPDDR5 awareness
Improved compression
AMD RDNA 3.5 Improvements
Larger Engine
1SE, 2SA, 8 WGP, 4 RB+, 2MB GL2 Engine
2.9G Fmax results in >11 TFLOPs (~30% higher)

18 |
AMD XDNA 2 Architectural Innovations
World’s First “Win24 Ready” NPU on x86 Processor
AMD Ryzen AI “Strix” NPU
SoC-Level Infinity Fabric Interface
Scratchpad SRAM & Global DMAs
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
Improve
Multi-Tasking:
Up to 8x Concurrent
Isolated
Spatial streams
Column-based
Power Gating
1.6x On-Chip
Memory vs.
Previous Gen
Broad AI Model Support
Generative AI, Unlocking new AI PC Experiences
Peak Performance
50 INT8 TOPS
50 Block FP16 TFLOPS
Gen-on-Gen Improvements from Phoenix
2x more concurrent spatial streams
1.6x on-chip memory capacity
Advanced Features
Block floating point support
Enhanced support for non-linear functions (tanh, exp)
50% weight sparsity
Improved Power Efficiency
Per column power gating
Up to 2x Perf/W improvement
AMD XDNA 2 Architecture

19 |
• CPU
• Upto 2 x 8C16T – 1MB L2/Core, 32MB L3 CCDs
• 512b datapath for FP, optimized for high frequency
• Datapath – 32B/cycle port each
• GPU
• 1 WGP (2 CU) RDNA 2
• Datapath – 2 x 32B/cycle ports
• Accelerators / uControllers
• Video Encode/Decode
• Audio Co-processor
• Display Controller
• System Management, Security
• IO
• 128b DDR5 5600 MT/s
• 28L PCIe® Gen5
• 5 USB ports
• 3 USB3.3 Type-C
• 1 USB3.2 Gen2 Type-A
• 1 USB2
• 4 Simultaneous display streams
• I2C, SPI/eSPI, GPIO
CPU CORE
CPU CORE
CPU CORE
CPU CORE
AMD “Granite Ridge” SOC
“Granite Ridge” SOC
CPU CORE
CPU CORE
CPU CORE
CPU CORE
32MB L3 Cache
Die-to-die Infinity Fabric
SMU
CPU CORE
CPU CORE
CPU CORE
CPU CORE
CPU CORE
CPU CORE
CPU CORE
CPU CORE
32MB L3 Cache
SMU
Infinity Fabric
X64
DDR5
X64
DDR5
RDNA 2
(1 Compute Unit)
Multimedia
Engines
VCN
ACP
Display Controller
4 Display Support
PCIe 5.0
28L
USB 3.2
USB 2.0
USB-C®
FCH
SPI/eSPI
GPIO
Clocking
System
Manageme
nt
Unit

▪ "Zen 5" :
▪ Yet another on-cadence major performance increase
▪ Balanced cross-core 1T/2T instruction and data throughput
▪ AVX512 with 512bit FP data-paths for throughput and AI uplift
▪ Efficient, performant, configurable solutions which scale:
▪ Variants: Peak performance (“Zen 5” and “Zen 5c”)
▪ Configurable FP and cache hierarchy
▪ Multiple processes across the product line
▪ “Strix Point“, “Granite Ridge”
▪ Commanding Performance and Gaming Leadership with Granite Ridge
▪ Continuing our support of the AM5 infrastructure
▪ With increased compute and efficiency across the entire chip, Strix Point delivers
a no-compromise AI PC solution
▪ Continuing support in the FP8 infrastructure
▪ AMD continues to drive Leadership
Performance and Efficiency
Summary: AMD Delivers Again!

Disclaimer & Attribution
The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions, and typographical errors.
The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component
and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades,
or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes
from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.
AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY
INACCURACIES, ERRORS, OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.
AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE
LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION
CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
ATTRIBUTION
© 2024 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United
States and/or other jurisdictions. Other names are for informational purposes only and may be trademarks of their respective owners.

Endnotes
GD-122: “Zen” is a codename for AMD architecture and is not a product name.
R5K-003: Testing by AMD performance labs as of 09/01/2020. IPC evaluated with a selection of 25 workloads running at a locked 4GHz frequency on 8-core "Zen 2" Ryzen 7
3800XT and "Zen 3" Ryzen 7 5800X desktop processors configured with Windows® 10, NVIDIA GeForce RTX 2080 Ti (451.77), Samsung 860 Pro SSD, and 2x8GB DDR4-3600.
Results may vary.
EPYC-038: Based on AMD internal testing as of 09/19/2022, geomean performance improvement at the same fixed-frequency on a 4th Gen AMD EPYC 9554 CPU compared
to a 3rd Gen AMD EPYC 7763 CPU using a select set of workloads (33) including est. SPECrate®2017_int_base, est. SPECrate®2017_fp_base, and representative server
workloads. SPEC®, SPEC CPU®, and SPECrate® are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information.
GNR-03: Testing as of May 2024 by AMD Performance labs. "Zen 5" system configured with: Ryzen 9 9950X GIGABYTE X670E AORUS MASTER motherboard, Balanced,
DDR5-6000, Radeon RX 7900 XTX, VBS=ON, SAM=ON, KRACKENX63 vs. "Zen 4" system configured with: Ryzen 7 7700X, ASUS ROG Crosshair X670E motherboard,
Balanced, DDR5-6000, Radeon RX 7900 XTX, VBS=ON, SAM=ON, KRAKENX62 {FixedFrequency=4.0 GHz}. Applications tested include: Handbrake, League of
Legends, FarCry 6, Puget Adobe Premiere Pro, 3DMark Physics, Kraken, Blender, Cinebench (n-thread), Geekbench, Octane, Speedometer, and WebXPRT. System
manufacturers may vary configurations, yielding different results. GNR-03.
CTT-001 - AMD testing as of 05/30/2024. The detailed results show average ns/day (nanoseconds per day) for the 2P Intel Xeon 8592+ system and the AMD 5th Gen EPYC (pre-
production silicon) system running the namd-stmv20m test of the NAMD 2.15alpha2 benchmark. EPYC test run results followed by Xeon test run results in parenthesis. • namd-
stmv20m: EPYC Normalized to Xeon 3.085x, 3.049x, 3.059x for an average of ~3.06x the performance/~206% higher performance System configurations: AMD: 2 x 128-core
AMD 5th Gen EPYC on AMD reference platform; Memory: 1.5 TB RAM; BIOS: Pre-production; BIOS options: SMT=OFF, NPS=4, OS: RHEL 9.4 kernel 5.14.0-
427.16.1.el9_4.x86_64; Kernel options: amd_iommu=on iommu=pt mitigations=off; Runtime options: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU
Governor - Performance, Disable C2 States Intel: 2 x 64-core Intel Xeon 8592+ production system; Memory 1.0 TB RAM; Hyperthreading=OFF, Profile=https://www.slideshare.net/slideshow/amd-zen-5-architecture-deep-dive-from-tech-day/Maximum Performance;
OS: RHEL 9.4 kernel 5.14.0-427.16.1.el9_4.x86_64; Kernel options: processor.max_cstate=1 intel_idle.max_cstate=0 iommu=pt mitigations=off; Runtime options: Clear caches,
NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance Results may vary based on factors including but not limited to production silicon, system
configurations, software versions and final BIOS version and settings.
CTT-002: AMD internal testing as of 5/31/2024 on Llama2-7B-CHAT-HF comparing 2P 5th Gen EPYC 128C “Turin” (pre-production) powered server to 2P 5th Gen Intel Xeon
8592+ powered server. All testing with weights quantized to INT4 and with latency under msec. System Configurations: 2P 5th Gen EPYC 128-C pre-production silicon
(128C/256T) on AMD reference system, BIOS: pre-production (Determinism=Power, NPS=1), Memory: 1.5TB, OS: Ubuntu® 22.04.3 LTS | 5.15.0-105-generic. 2P Xeon Platinum
8952+ (64C/128T) production system, (SNC=Disabled), Memory: 1TB ; OS: Ubuntu 22.04.3 LTS | 5.15.0-94-generic Results may vary due to factors including but not limited to
production silicon, system configurations, software versions and BIOS settings.

AMD Zen 5 Architecture Deep Dive from Tech Day

AMD Zen 5 Architecture Deep Dive from Tech Day

Related slideshows

More Related Content

Similar to AMD Zen 5 Architecture Deep Dive from Tech Day

Similar to AMD Zen 5 Architecture Deep Dive from Tech Day (20)

More from Low Hong Chuan

More from Low Hong Chuan (20)

Recently uploaded

Recently uploaded (20)

AMD Zen 5 Architecture Deep Dive from Tech Day