QsNetIII, An HPC Interconnect For Peta Scale SystemsFederica Pisani
QsNetIII Network
–Multi-stage switch network
–Evolution of the QsNetIIdesign
–Increased use of commodity hardware
–Increasing support for standard software
•QsNetIII Components
–ASICs Elan5 and Elite5
–Adapters, switches, cables
–Firmware, drivers, libraries
–Diagnostics, documentation
4 p9 architecture overview japan meetupYutaka Kawai
This document provides an overview of the POWER9 architecture and systems. It discusses the POWER9 processor roadmap and highlights of the POWER9 scale-out processor including new cores optimized for analytics and cloud workloads. It also shows POWER9 system block diagrams, including the Mihawk system from Wistron and the Zaius and Barreleye G2 systems from Google/Rackspace that utilize the POWER9 La Grange processor.
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
In this deck from the Hot Chips conference, Chris Nicol from Wave Computing presents: A Dataflow Processing Chip for Training Deep Neural Networks.
Watch the video: https://wp.me/p3RLHQ-k6W
Learn more: https://wavecomp.ai/
and
http://www.hotchips.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Heterogeneous Computing : The Future of SystemsAnand Haridass
Charts from NITK-IBM Computer Systems Research Group (NCSRG)
- Dennard Scaling,Moore's Law, OpenPOWER, Storage Class Memory, FPGA, GPU, CAPI, OpenCAPI, nVidia nvlink, Google Microsoft Heterogeneous system usage
DPDK is a set of drivers and libraries that allow applications to bypass the Linux kernel and access network interface cards directly for very high performance packet processing. It is commonly used for software routers, switches, and other network applications. DPDK can achieve over 11 times higher packet forwarding rates than applications using the Linux kernel network stack alone. While it provides best-in-class performance, DPDK also has disadvantages like reduced security and isolation from standard Linux services.
QCT Ceph Solution - Design Consideration and Reference ArchitectureCeph Community
This document discusses QCT's Ceph storage solutions, including an overview of Ceph architecture, QCT hardware platforms, Red Hat Ceph software, workload considerations, benchmark testing results, and a collaboration between QCT, Red Hat, and Intel to provide optimized and validated Ceph solutions. Key reference architectures are presented targeting small, medium, and large storage capacities with options for throughput, capacity, or IOPS optimization.
QCT Ceph Solution - Design Consideration and Reference ArchitecturePatrick McGarry
This document discusses QCT's Ceph storage solutions, including an overview of Ceph architecture, QCT hardware platforms, Red Hat Ceph software, workload considerations, reference architectures, test results and a QCT/Red Hat whitepaper. It provides technical details on QCT's throughput-optimized and capacity-optimized solutions and shows how they address different storage needs through workload-driven design. Hands-on testing and a test drive lab are offered to explore Ceph features and configurations.
This document provides a summary of the IBM POWER9 AC922 system with 6 GPUs. It includes details on the POWER9 processor which features 24 cores per die, an enhanced cache hierarchy up to 120MB, and on-chip accelerators. The AC922 system utilizes two POWER9 processors, supports up to 512GB memory via 16 DDR4 DIMMs, and has three Nvidia Volta GPUs per socket connected via NVLink 2.0. It also discusses the POWER ISA v3.0 instruction set and how POWER9 serves as a premier acceleration platform with technologies like CAPI, OpenCAPI, and NVLink.
Red Hat Storage Day Seattle: Supermicro Solutions for Red Hat Ceph and Red Ha...Red_Hat_Storage
This document discusses Supermicro's evolution from server and storage innovation to total solutions innovation. It provides examples of their all-flash storage servers and Red Hat Ceph testing results. Finally, it outlines their approach to providing optimized, turnkey storage solutions based on workload requirements and best practices learned from customer deployments and testing.
DPDK Summit 2015 - Aspera - Charles ShiflettJim St. Leger
DPDK Summit 2015 in San Francisco.
Presentation by Charles Shiflett, Aspera.
For additional details and the video recording please visit www.dpdksummit.com.
Q1 Memory Fabric Forum: Using CXL with AI Applications - Steve Scargall.pptxMemory Fabric Forum
MemVerge product manager and software architect Steve Scargall discusses key factors related to the use of CXL with AI apps including, memory expansion form factors, latency and bandwidth memory placement strategies, RDBMS investigation and results, vector database investigation, and results understanding your application behavior.
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
The document summarizes a presentation given by representatives from various companies on optimizing Ceph for high-performance solid state drives. It discusses testing a real workload on a Ceph cluster with 50 SSD nodes that achieved over 280,000 read and write IOPS. Areas for further optimization were identified, such as reducing latency spikes and improving single-threaded performance. Various companies then described their contributions to Ceph performance, such as Intel providing hardware for testing and Samsung discussing SSD interface improvements.
The document discusses strategies for improving application performance on POWER9 processors using IBM XL and open source compilers. It reviews key POWER9 features and outlines common bottlenecks like branches, register spills, and memory issues. It provides guidelines on using compiler options and coding practices to address these bottlenecks, such as unrolling loops, inlining functions, and prefetching data. Tools like perf are also described for analyzing performance bottlenecks.
OpenCAPI is an open standard interface that provides high bandwidth and low latency connections between processors, accelerators, memory and storage. It addresses the growing need for increased performance driven by workloads like AI and the limitations of Moore's Law. OpenCAPI supports a heterogeneous system architecture with technologies like FPGAs and different memory types. It uses a thin protocol stack and virtual addressing to minimize latency. The SNAP framework also makes programming accelerators using OpenCAPI easier by abstracting the hardware details.
This document discusses OpenCAPI acceleration using the OpenCAPI Acceleration Framework (oc-accel). It provides an overview of the oc-accel components and workflow, benchmarks the OC-Accel bandwidth and latency, and provides examples of how to fully utilize OC-Accel capabilities to accelerate functions on an FPGA. The document also outlines the OC-Accel development process and previews upcoming features like support for ODMA to port existing PCIe accelerators to OpenCAPI.
CETH for XDP [Linux Meetup Santa Clara | July 2016] IO Visor Project
This document discusses CETH (Common Ethernet Driver Framework), which aims to improve kernel networking performance for virtualization. CETH simplifies NIC drivers by consolidating common functions. It supports various NICs and accelerators. CETH features efficient memory and buffer management, flexible TX/RX scheduling, and a customizable metadata structure. It is being simplified to work with XDP for even higher performance network I/O processing in the kernel. Next steps include further optimizations and measuring performance gains when using CETH with XDP and virtualized environments.
Shak larry-jeder-perf-and-tuning-summit14-part1-finalTommy Lee
This document provides an overview and agenda for a performance analysis and tuning presentation focusing on Red Hat Enterprise Linux evolution, NUMA scheduling improvements, and use of cgroups/containers for resource management. Key points include how RHEL has incorporated features like tuned profiles, transparent hugepages, automatic NUMA balancing, and how cgroups can guarantee quality of service and enable dynamic resource allocation for multi-application environments. Performance results are shown for databases and SPEC benchmarks utilizing these features.
Ceph Day Beijing - Ceph all-flash array design based on NUMA architectureCeph Community
This document discusses an all-flash Ceph array design from QCT based on NUMA architecture. It provides an agenda that covers all-flash Ceph and use cases, QCT's all-flash Ceph solution for IOPS, an overview of QCT's lab environment and detailed architecture, and the importance of NUMA. It also includes sections on why all-flash storage is used, different all-flash Ceph use cases, QCT's IOPS-optimized all-flash Ceph solution, benefits of using NVMe storage, and techniques for configuring and optimizing all-flash Ceph performance.
Ceph Day Beijing - Ceph All-Flash Array Design Based on NUMA ArchitectureDanielle Womboldt
This document discusses an all-flash Ceph array design from QCT based on NUMA architecture. It provides an agenda that covers all-flash Ceph and use cases, QCT's all-flash Ceph solution for IOPS, an overview of QCT's lab environment and detailed architecture, and the importance of NUMA. It also includes sections on why all-flash storage is used, different all-flash Ceph use cases, QCT's IOPS-optimized all-flash Ceph solution, benefits of using NVMe storage, QCT's lab test environment, Ceph tuning recommendations, and benefits of using multi-partitioned NVMe SSDs for Ceph OSDs.
Similar to AMD Zen 5 Architecture Deep Dive from Tech Day (20)
AMD Ryzen and Althon 7020 Series Product BriefLow Hong Chuan
This document contains confidential AMD internal information regarding upcoming product launches and design wins. It discusses the "Mendocino" product launching in 2023 and projected design wins across several market segments including premium thin and light, elite ultrathin, extreme gaming and creator, and mainstream thin and light devices. The document contains multiple confidentiality notices and references to several internal endnotes.
The document is a briefing from AMD on their upcoming launch of new EPYC 7Fx2 processors. It provides details on the processors' expected performance leadership in various workloads compared to Intel Xeon alternatives, including highest performance on SPECrate benchmarks. It highlights new features like higher frequencies and support from partners like Nutanix. It concludes by detailing the new top-of-stack 7002 series processors with up to 24 cores and competitive performance and pricing compared to Intel.
AMD CES 2020 Press Conference PresentationLow Hong Chuan
The document appears to be an internal AMD document marked for AMD official use only and internal distribution only. It does not contain any other substantive information.
The document discusses an embargoed presentation by Intel on January 5th. It provides information on the event details, including instructions to refrain from sharing information until after the embargo is lifted at 10 pm PT. It also invites attendees to a drinks reception afterwards. The presentation will focus on innovations across Intel's technology pillars to enable the next era of computing, with a focus on AI, notebooks, and 10th generation Intel Core processors.
This document is a launch deck for the Radeon Software Adrenalin 2020 Edition. It provides information on new features for home, gaming, streaming and performance improvements. Key highlights include up to 34% faster driver installation, up to 23% average performance increase with Radeon Boost, and an average of 12% higher performance versus the previous year's driver based on internal testing. The deck is marked as confidential until 9:00AM ET on December 10th, 2019.
This document discusses a new class of embedded processing that is under embargo until April 16, 2019 at 3:30 AM ET. It references a 2017 document about mapping the future of silicon for AI. The new embedded processing provides improved performance compared to previous generations while maintaining a focus on low power. It also allows for a range of applications from edge servers to client devices.
AMD is announcing mobility updates at CES 2019, including new Ryzen processors and Radeon graphics for laptops. The document provides details on AMD's latest mobile CPUs and APUs with up to 4 cores and 8-threads, and Radeon Vega and Radeon RX graphics. Performance benchmarks show AMD processors outperforming Intel chips in various tasks like web browsing, productivity applications, and gaming. AMD aims to provide longer battery life and better performance over older devices to drive growth in the mobile market.
The document summarizes new Intel Core X-series desktop processors, including:
- Up to 18 cores, 36 threads, and 68 PCIe lanes for powerful multithreaded workloads.
- Features like Solder Thermal Interface Material and Intel Turbo Boost Max Technology 3.0 for optimized performance.
- Designed for creators to simultaneously record, edit, render and more with the flexibility to customize their system.
- Provides up to 73% faster 3D rendering and 50% faster video editing than Intel Core S-series processors.
Mark Papermaster Next Horizon PresentationLow Hong Chuan
The document provides details on AMD's next-generation EPYC server processor architecture codenamed "Rome" including:
- It will use a multi-chip module design with up to 8 seven nanometer CPU chiplets arranged around an I/O chip.
- Each CPU chiplet contains enhanced cores based on the new "Zen 2" microarchitecture with improvements to branch prediction, prefetching, caches, and floating point throughput.
- The MCM package will connect the CPU chiplets through an Infinity Fabric interconnect to provide a large amount of cores, threads, cache, and I/O in a single processor.
AMD's internal testing showed that its new "Zen 2" CPUs can complete more instructions per cycle than the previous "Zen 1" generation across floating point and integer benchmarks, indicating improved performance. The estimated increase in instructions per cycle is based on AMD's own microbenchmark tests comparing "Zen 2" to "Zen 1".
AMD's internal testing showed that its new "Zen 2" CPUs can perform around 15% more instructions per cycle than previous "Zen 1" generation CPUs, based on a combination of floating point and integer benchmarks. The increase in instructions per cycle was estimated from AMD's microbenchmark tests that compared "Zen 2" to "Zen 1" performance.
The document discusses the Snapdragon 845 mobile platform and benchmarks used to evaluate its performance. It notes that Snapdragon 845 powers over 120 device designs and provides improved performance over Snapdragon 835 through its Kryo 385 CPU, Adreno 630 GPU, and other components. However, it states that popular benchmarks only evaluate a limited subset of Snapdragon 845's capabilities and advocates evaluating real user experiences through tests of graphics, audio, AI, security and other functions instead.
Intel 8th Core G Series with Radeon Vega M Low Hong Chuan
The document discusses 8th generation Intel Core processors with Radeon RX Vega M graphics. It provides an overview of the new processors and their positioning for gaming, content creation, and VR/MR. It highlights key features like Intel EMIB technology, HBM2 memory, and dynamic power sharing. Performance benchmarks show improvements over 3-year-old systems for gaming, productivity and content creation workloads. Innovative thin and light desktop designs are also discussed.
AMD Ryzen Mobile with Radeon Vega Graphics Low Hong Chuan
This document contains technical details about AMD's Ryzen Mobile processors, including performance benchmarks and power optimization techniques. Key points include: Ryzen Mobile processors offer improved battery life over previous generations, with VP9 video playback extended up to 9.2 hours; the processors use "Zen" CPU cores and "Vega" graphics cores on a single chip with an Infinity Fabric interconnect; and AMD has implemented various power optimization techniques like deeper low power states and fine-grained power gating to reduce power consumption.
- Ryzen Threadripper processors provide HEDT leadership with more cores, threads, cache and memory bandwidth than Intel's competing Core i9 processors.
- The new lineup includes the 16-core/32-thread 1950X, 12-core/24-thread 1920X, and 8-core/16-thread 1900X models.
- Benchmark results show the Threadripper 1950X outperforming the Core i9-7900X in multi-threaded workloads by up to 38% while providing equivalent performance at a lower price point.
The document discusses AMD's Radeon graphics technologies and products. It begins by summarizing AMD's growth in graphics revenue and strategy to focus on graphics. It then discusses improvements in software, hardware like the Polaris architecture, and upcoming high-end Vega graphics cards and technologies. Throughout it emphasizes goals of high performance, virtual reality support, and optimized experiences for gaming and creative workloads.
How CXAI Toolkit uses RAG for Intelligent Q&AZilliz
Manasi will be talking about RAG and how CXAI Toolkit uses RAG for Intelligent Q&A. She will go over what sets CXAI Toolkit's Intelligent Q&A apart from other Q&A systems, and how our trusted AI layer keeps customer data safe. She will also share some current challenges being faced by the team.
Leading Bigcommerce Development Services for Online RetailersSynapseIndia
As a leading provider of Bigcommerce development services, we specialize in creating powerful, user-friendly e-commerce solutions. Our services help online retailers increase sales and improve customer satisfaction.
Planetek Italia is an Italian Benefit Company established in 1994, which employs 120+ women and men, passionate and skilled in Geoinformatics, Space solutions, and Earth science.
We provide solutions to exploit the value of geospatial data through all phases of data life cycle. We operate in many application areas ranging from environmental and land monitoring to open-government and smart cities, and including defence and security, as well as Space exploration and EO satellite missions.
Securiport Gambia is a civil aviation and intelligent immigration solutions provider founded in 2001. The company was created to address security needs unique to today’s age of advanced technology and security threats. Securiport Gambia partners with governments, coming alongside their border security to create and implement the right solutions.
The Challenge of Interpretability in Generative AI Models.pdfSara Kroft
Navigating the intricacies of generative AI models reveals a pressing challenge: interpretability. Our blog delves into the complexities of understanding how these advanced models make decisions, shedding light on the mechanisms behind their outputs. Explore the latest research, practical implications, and ethical considerations, as we unravel the opaque processes that drive generative AI. Join us in this insightful journey to demystify the black box of artificial intelligence.
Dive into the complexities of generative AI with our blog on interpretability. Find out why making AI models understandable is key to trust and ethical use and discover current efforts to tackle this big challenge.
Connecting Attitudes and Social Influences with Designs for Usable Security a...Cori Faklaris
Many system designs for cybersecurity and privacy have failed to account for individual and social circumstances, leading people to use workarounds such as password reuse or account sharing that can lead to vulnerabilities. To address the problem, researchers are building new understandings of how individuals’ attitudes and behaviors are influenced by the people around them and by their relationship needs, so that designers can take these into account. In this talk, I will first share my research to connect people’s security attitudes and social influences with their security and privacy behaviors. As part of this, I will present the Security and Privacy Acceptance Framework (SPAF), which identifies Awareness, Motivation, and Ability as necessary for strengthening people’s acceptance of security and privacy practices. I then will present results from my project to trace where social influences can help overcome obstacles to adoption such as negative attitudes or inability to troubleshoot a password manager. I will conclude by discussing my current work to apply these insights to mitigating phishing in SMS text messages (“smishing”).
IVE 2024 Short Course - Lecture 2 - Fundamentals of PerceptionMark Billinghurst
Lecture 2 from the IVE 2024 Short Course on the Psychology of XR. This lecture covers some of the Fundamentals of Percetion and Psychology that relate to XR.
The lecture was given by Mark Billinghurst on July 15th 2024 at the University of South Australia.
Jacquard Fabric Explained: Origins, Characteristics, and Usesldtexsolbl
In this presentation, we’ll dive into the fascinating world of Jacquard fabric. We start by exploring what makes Jacquard fabric so special. It’s known for its beautiful, complex patterns that are woven into the fabric thanks to a clever machine called the Jacquard loom, invented by Joseph Marie Jacquard back in 1804. This loom uses either punched cards or modern digital controls to handle each thread separately, allowing for intricate designs that were once impossible to create by hand.
Next, we’ll look at the unique characteristics of Jacquard fabric and the different types you might encounter. From the luxurious brocade, often used in fancy clothing and home décor, to the elegant damask with its reversible patterns, and the artistic tapestry, each type of Jacquard fabric has its own special qualities. We’ll show you how these fabrics are used in everyday items like curtains, cushions, and even artworks, making them both functional and stylish.
Moving on, we’ll discuss how technology has changed Jacquard fabric production. Here, LD Texsol takes center stage. As a leading manufacturer and exporter of electronic Jacquard looms, LD Texsol is helping to modernize the weaving process. Their advanced technology makes it easier to create even more precise and complex patterns, and also helps make the production process more efficient and environmentally friendly.
Finally, we’ll wrap up by summarizing the key points and highlighting the exciting future of Jacquard fabric. Thanks to innovations from companies like LD Texsol, Jacquard fabric continues to evolve and impress, blending traditional techniques with cutting-edge technology. We hope this presentation gives you a clear picture of how Jacquard fabric has developed and where it’s headed in the future.
Flame emission spectroscopy is an instrument used to determine concentration of metal ions in sample. Flame provide energy for excitation atoms introduced into flame. It involve components like sample delivery system, burner, sample, mirror, slits, monochromator, filter, detector (photomultiplier tube and photo tube detector). There are many interference involved during analysis of sample like spectral interference, ionisation interference, chemical interference ect. It can be used for both quantitative and qualitative study, determine lead in petrol, determine alkali and alkaline earth metal, determine fertilizer requirement for soil.
Increase Quality with User Access Policies - July 2024Peter Caitens
⭐️ Increase Quality with User Access Policies ⭐️, presented by Peter Caitens and Adam Best of Salesforce. View the slides from this session to hear all about “User Access Policies” and how they can help you onboard users faster with greater quality.
Multimodal Embeddings (continued) - South Bay Meetup SlidesZilliz
Frank Liu will walk through the history of embeddings and how we got to the cool embedding models used today. He'll end with a demo on how multimodal RAG is used.
3. 3 | NEXT GENERATION “ZEN 5” CORE
CAUTIONARY STATEMENT
This presentation contains forward-looking statements concerning Advanced Micro Devices, Inc. (AMD) such as the features,
functionality, performance, availability, timing and expected benefits of AMD’s current products, future products and product
roadmaps, which are made pursuant to the Safe Harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward-
looking statements are commonly identified by words such as "would," "may," "expects," "believes," "plans," "intends," "projects"
and other terms with similar meaning. Investors are cautioned that the forward-looking statements in this presentation are based on
current beliefs, assumptions and expectations, speak only as of the date of this presentation and involve risks and uncertainties that
could cause actual results to differ materially from current expectations. Such statements are subject to certain known and unknown
risks and uncertainties, many of which are difficult to predict and generally beyond AMD's control, that could cause actual results and
other future events to differ materially from those expressed in, or implied or projected by, the forward-looking information and
statements. Investors are urged to review in detail the risks and uncertainties in AMD’s Securities and Exchange Commission filings,
including but not limited to AMD’s most recent reports on Forms 10-K and 10-Q.
AMD does not assume, and hereby disclaims, any obligation to update forward-looking statements made in this presentation, except
as may be required by law.
4. 4 | NEXT GENERATION “ZEN 5” CORE
Performance
▪ Deliver another major 1T and 2T performance increase
▪ New Foundation of Compute for the Future
▪ AVX512 on 512-bit datapath to increase throughput and AI uplift
Platform Support
▪ Deliver “Zen 5” and “Zen 5c” (compact) core variants
▪ Support configurable FP512/FP256 datapath
▪ Support scaling and energy efficiency
▪ Support 4 and 3 nm
▪ Enhanced ISA capabilities
Design Objectives
5. 5 | NEXT GENERATION “ZEN 5” CORE
NextGen Branch Predictor
Caches
▪ I-Cache: 32KB, 8-way; 2x 32B fetch/cycle
▪ Op-Cache: 6K inst; 2x 6-wide fetch/cycle
▪ D-Cache: 48KB, 12-way; 4 mem ops/cycle
▪ L2-Cache: 1MB, 16-way
Dual I-Fetch/decode pipes, 4 inst/pipe
8 ops/cycle dispatched to Integer or FP
Execution capabilities
▪ 6 integer ALU
▪ 4 AGU, 4 addresses to LS per cycle
▪ 6 FP ops/cycle; 2-cycle FADD
▪ Full 512b AVX512 datapaths
Dataflow
▪ 4 load pipes capable of 2, 512b AVX512 loads
▪ 2x width L2 cache <-> L1I and L1D caches
2 Threads per core
“Zen5” MicroarchitectureOverview
11. IPC
Frequency
Area
Power efficiency
"Zen 5c"
"Zen 5"
"Zen 5” and “Zen 5c” in Heterogeneous SOC
“Zen 5” and “Zen 5c” in separate core complexes
“Zen 5” Optimized for maximum 1T performance
▪ High max frequency Target
▪ Large L3 per core
“Zen 5c” Optimized for scalability
▪ Same IPC and features
▪ Lower max frequency
▪ Increased power efficiency
▪ Lower L3 per core
Simplifies software scheduling
▪ Same IPC means no unique bottlenecks like vector
performance
▪ Both support SMT
▪ Modulate between ultimate performance vs.
efficiency
▪ Scheduling “mistakes” minimized over time
12. 12 | NEXT GENERATION “ZEN 5” CORE
Attribute ISA Feature
Instructions
▪ MOVDIRI/MOVD64B – move 4,8 or 64 bytes as a direct store, bypassing caches
▪ VP2INTERSECT[DQ] – AVX512 vector pair intersection to a pair of mask registers
▪ VNNI/VEX – extends AVX512 instruction to VEX encoding
▪ PREFETCH[I*] – software prefetch of instruction lines into cache hierarchy
Kernel/Virtualization/QoS
▪ PMC virtualization – provides security for a guest vs. hypervisor; isolates PMC/guest
▪ Heterogeneous Topology
New "Zen 5" ISA
13. 13 | NEXT GENERATION “ZEN 5” CORE
Key “Zen 5” vs. “Zen 4” Capabilities
IC/BP
DE/OC/DI
EX/SC
LS/DC/L2
Attribute Zen 4 Zen 5
L1/L2 BTB 1.5K/7K 16K/8K
Return Address Stack 32 52
ITLB L1/L2 64/512 64/2048
Fetched/Decoded Instruction Bytes/cycle 32 64
Op Cache associativity 12-way 16-way
Op Cache bandwidth 9macro-ops 12 inst or fused inst
Dispatch bandwidth (macro-ops/cycle) 6 8
AGU Scheduler 3x24 ALU/AGU 56
ALU Scheduler 1x24 ALU 88
ALU/AGU 4/3 6/4
Int PRF (reg/flag) 224/126 240/192
Vector Reg 192 384
FP Pre-Sched Queue 64 96
FP Scheduler 2x32 3x38
FP Pipes 3 4
Vector Width 256b 256b/512b
ROB/Retire Queue 320 448
LS Mem Pipes support Load/Store 3/1 4/2
DTLB L1/L2 72/3072 96/4096
L1Data Cache 32KB/8-way 48KB/12-way
L2 per core 1MB/8w 1MB/16w
L2 bandwidth 32B/clk 64B/clk
“Zen 5”
Uplift Breakdown
14. 14 | NEXT GENERATION “ZEN 5” CORE
“Zen 5” Core Complex speeds and feeds
32B/cycle
4 Load (Max 2 for 512b)
2 Store (Max 1 for 512b)
Core 1
…
Core 7
1MB L2
I+D
Cache
16-way
32B/cycle
64B/cycle
64B/cycle
32K
I Cache
8-way
48K
D Cache
12-way
2 x 32B fetch
Core
0
• Double the L2 associativity
• Double the L2 Bandwidth
• Low latency L3 with 320 L3 in-flight
misses
• Baseline from “Zen 4”:
• Fast private 1MB L2 cache
• L3 shared among all cores in the complex
• L3 is filled from L2 victims
• L2 tags duplicated in L3 for probe filtering
and fast cache transfer
32MB L3
I+D
Cache
16-way
16. 16 | NEXT GENERATION “ZEN 5” CORE
• CPU
• 4C8T Zen5 – 1MB L2/core, 16MB L3 CCX
• 8C16T Zen5c – 1MB L2/core, 8MB L3 CCX
• Datapath – 32B/cycle port each
• GPU
• 8 WGP (16 CU) RDNA 3.5
• Datapath – 4 x 32B/cycle ports
• NPU
• 4 x 8 Array XDNA 2 Inference Engine
• Datapath – 32B/cycle
• Accelerators / uControllers
• Video Encode/Decode
• Audio Co-processor
• Display Controller
• System Management, Security, Wireless Manageability
• IO
• 128b LPDDR5/DDR5 (7500/5600 MT/s)
• 16L PCIe Gen4
• 4 Simultaneous display streams
• 8 USB ports
• 2 USB4 v1
• 1 USB3 Type-C
• 2 USB3.2 Gen2
• 3 USB2
• I2c, SPI/eSPI, GPIO
RDNA3.5
(Upto 16 Compute Units)
2MB L2 Cache
16 CU
4RB+
AMD Radeon 800M Graphics
Complete
System
Connectivity
Integrated
Sensor
Fusion
Hub
Accelerated
Multimedia
Experience
“Zen5” CCXs
X32
DDR5
/
LPDDR5
Infinity Fabric
CPU CORE
CPU CORE
CPU CORE
CPU CORE
CPU CORE
CPU CORE
CPU CORE
CPU CORE
16MB L3
Cache
PCIe 4.0
GPP
USB4®
USB 3.1
USB 2.0
USB-C®
(w/DP Mode)
NVMe
PCIe 4.0
SATA
Wireless
Managability
Subsystem
PCIe 4.0
Discrete
GFX
Sensor
Fusion
Hub
FCH
Multimedia
Engines
Display Controller
4 Display Support
DISPLAYPORT 2
HDMI® 2.1
System
Manage
ment
Unit
Video
Codec ACP
AMD
Platform
Security
Processor
Microsoft
Pluton
Processor
X32
DDR5
/
LPDDR5
X32
DDR5
/
LPDDR5
X32
DDR5
/
LPDDR5
XDNA2
Inference
Engine
CPU CORE
CPU CORE
CPU CORE
CPU CORE
8MB
L3
Cach
e
AMD “Strix Point” SOC
“Strix Point” SOC
17. 17 | NEXT GENERATION “ZEN 5” CORE
AMD RDNA 3.5
Texture Subsystem
2x Sampler Rate, Point sampling acceleration
Shader Subsystem
2x Interpolation and Comparison rates
Floating point in SALU
Skip single-use VGPR writes
Rastor Subsystem
Sub-batching allows hardware to be efficient
Programmable bin order
Memory Subsystem Improvements
LPDDR5 awareness
Improved compression
AMD RDNA 3.5 Improvements
Larger Engine
1SE, 2SA, 8 WGP, 4 RB+, 2MB GL2 Engine
2.9G Fmax results in >11 TFLOPs (~30% higher)
18. 18 |
AMD XDNA 2 Architectural Innovations
World’s First “Win24 Ready” NPU on x86 Processor
AMD Ryzen AI “Strix” NPU
SoC-Level Infinity Fabric Interface
Scratchpad SRAM & Global DMAs
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
AIE
Tile
Improve
Multi-Tasking:
Up to 8x Concurrent
Isolated
Spatial streams
Column-based
Power Gating
1.6x On-Chip
Memory vs.
Previous Gen
Broad AI Model Support
Generative AI, Unlocking new AI PC Experiences
Peak Performance
50 INT8 TOPS
50 Block FP16 TFLOPS
Gen-on-Gen Improvements from Phoenix
2x more concurrent spatial streams
1.6x on-chip memory capacity
Advanced Features
Block floating point support
Enhanced support for non-linear functions (tanh, exp)
50% weight sparsity
Improved Power Efficiency
Per column power gating
Up to 2x Perf/W improvement
AMD XDNA 2 Architecture
19. 19 |
• CPU
• Upto 2 x 8C16T – 1MB L2/Core, 32MB L3 CCDs
• 512b datapath for FP, optimized for high frequency
• Datapath – 32B/cycle port each
• GPU
• 1 WGP (2 CU) RDNA 2
• Datapath – 2 x 32B/cycle ports
• Accelerators / uControllers
• Video Encode/Decode
• Audio Co-processor
• Display Controller
• System Management, Security
• IO
• 128b DDR5 5600 MT/s
• 28L PCIe® Gen5
• 5 USB ports
• 3 USB3.3 Type-C
• 1 USB3.2 Gen2 Type-A
• 1 USB2
• 4 Simultaneous display streams
• I2C, SPI/eSPI, GPIO
CPU CORE
CPU CORE
CPU CORE
CPU CORE
AMD “Granite Ridge” SOC
“Granite Ridge” SOC
CPU CORE
CPU CORE
CPU CORE
CPU CORE
32MB L3 Cache
Die-to-die Infinity Fabric
SMU
CPU CORE
CPU CORE
CPU CORE
CPU CORE
CPU CORE
CPU CORE
CPU CORE
CPU CORE
32MB L3 Cache
Die-to-die Infinity Fabric
SMU
Die-to-die Infinity Fabric
Die-to-die Infinity Fabric
Infinity Fabric
X64
DDR5
X64
DDR5
RDNA 2
(1 Compute Unit)
Multimedia
Engines
VCN
ACP
Display Controller
4 Display Support
PCIe 5.0
28L
USB 3.2
USB 2.0
USB-C®
FCH
SPI/eSPI
GPIO
Clocking
System
Manageme
nt
Unit
20. 20 | NEXT GENERATION “ZEN 5” CORE
▪ "Zen 5" :
▪ Yet another on-cadence major performance increase
▪ Balanced cross-core 1T/2T instruction and data throughput
▪ AVX512 with 512bit FP data-paths for throughput and AI uplift
▪ Efficient, performant, configurable solutions which scale:
▪ Variants: Peak performance (“Zen 5” and “Zen 5c”)
▪ Configurable FP and cache hierarchy
▪ Multiple processes across the product line
▪ “Strix Point“, “Granite Ridge”
▪ Commanding Performance and Gaming Leadership with Granite Ridge
▪ Continuing our support of the AM5 infrastructure
▪ With increased compute and efficiency across the entire chip, Strix Point delivers
a no-compromise AI PC solution
▪ Continuing support in the FP8 infrastructure
▪ AMD continues to drive Leadership
Performance and Efficiency
Summary: AMD Delivers Again!
22. 22 | NEXT GENERATION “ZEN 5” CORE
Endnotes
GD-122: “Zen” is a codename for AMD architecture and is not a product name.
R5K-003: Testing by AMD performance labs as of 09/01/2020. IPC evaluated with a selection of 25 workloads running at a locked 4GHz frequency on 8-core "Zen 2" Ryzen 7
3800XT and "Zen 3" Ryzen 7 5800X desktop processors configured with Windows® 10, NVIDIA GeForce RTX 2080 Ti (451.77), Samsung 860 Pro SSD, and 2x8GB DDR4-3600.
Results may vary.
EPYC-038: Based on AMD internal testing as of 09/19/2022, geomean performance improvement at the same fixed-frequency on a 4th Gen AMD EPYC 9554 CPU compared
to a 3rd Gen AMD EPYC 7763 CPU using a select set of workloads (33) including est. SPECrate®2017_int_base, est. SPECrate®2017_fp_base, and representative server
workloads. SPEC®, SPEC CPU®, and SPECrate® are registered trademarks of the Standard Performance Evaluation Corporation. See www.spec.org for more information.
GNR-03: Testing as of May 2024 by AMD Performance labs. "Zen 5" system configured with: Ryzen 9 9950X GIGABYTE X670E AORUS MASTER motherboard, Balanced,
DDR5-6000, Radeon RX 7900 XTX, VBS=ON, SAM=ON, KRACKENX63 vs. "Zen 4" system configured with: Ryzen 7 7700X, ASUS ROG Crosshair X670E motherboard,
Balanced, DDR5-6000, Radeon RX 7900 XTX, VBS=ON, SAM=ON, KRAKENX62 {FixedFrequency=4.0 GHz}. Applications tested include: Handbrake, League of
Legends, FarCry 6, Puget Adobe Premiere Pro, 3DMark Physics, Kraken, Blender, Cinebench (n-thread), Geekbench, Octane, Speedometer, and WebXPRT. System
manufacturers may vary configurations, yielding different results. GNR-03.
CTT-001 - AMD testing as of 05/30/2024. The detailed results show average ns/day (nanoseconds per day) for the 2P Intel Xeon 8592+ system and the AMD 5th Gen EPYC (pre-
production silicon) system running the namd-stmv20m test of the NAMD 2.15alpha2 benchmark. EPYC test run results followed by Xeon test run results in parenthesis. • namd-
stmv20m: EPYC Normalized to Xeon 3.085x, 3.049x, 3.059x for an average of ~3.06x the performance/~206% higher performance System configurations: AMD: 2 x 128-core
AMD 5th Gen EPYC on AMD reference platform; Memory: 1.5 TB RAM; BIOS: Pre-production; BIOS options: SMT=OFF, NPS=4, OS: RHEL 9.4 kernel 5.14.0-
427.16.1.el9_4.x86_64; Kernel options: amd_iommu=on iommu=pt mitigations=off; Runtime options: Clear caches, NUMA Balancing 0, randomize_va_space 0, THP ON, CPU
Governor - Performance, Disable C2 States Intel: 2 x 64-core Intel Xeon 8592+ production system; Memory 1.0 TB RAM; Hyperthreading=OFF, Profile=Maximum Performance;
OS: RHEL 9.4 kernel 5.14.0-427.16.1.el9_4.x86_64; Kernel options: processor.max_cstate=1 intel_idle.max_cstate=0 iommu=pt mitigations=off; Runtime options: Clear caches,
NUMA Balancing 0, randomize_va_space 0, THP ON, CPU Governor=Performance Results may vary based on factors including but not limited to production silicon, system
configurations, software versions and final BIOS version and settings.
CTT-002: AMD internal testing as of 5/31/2024 on Llama2-7B-CHAT-HF comparing 2P 5th Gen EPYC 128C “Turin” (pre-production) powered server to 2P 5th Gen Intel Xeon
8592+ powered server. All testing with weights quantized to INT4 and with latency under msec. System Configurations: 2P 5th Gen EPYC 128-C pre-production silicon
(128C/256T) on AMD reference system, BIOS: pre-production (Determinism=Power, NPS=1), Memory: 1.5TB, OS: Ubuntu® 22.04.3 LTS | 5.15.0-105-generic. 2P Xeon Platinum
8952+ (64C/128T) production system, (SNC=Disabled), Memory: 1TB ; OS: Ubuntu 22.04.3 LTS | 5.15.0-94-generic Results may vary due to factors including but not limited to
production silicon, system configurations, software versions and BIOS settings.