The CPU cache is a tiny temporary memory located on the CPU die itself. It stores prefetched data that the CPU will likely need for quick access. This is necessary to ensure the RAM doesn’t bottleneck the CPU.
Modern CPUs typically implement CPU cache in 3 levels – L1, L2, and L3. These play an important part in determining CPU performance (especially for certain tasks like gaming).
So, let’s look at how CPU cache works, why it matters, and how much CPU cache you’ll need for your workloads.
What Does the CPU Cache Do
The programs that you run are first loaded into the RAM. The CPU fetches, decodes, and executes instructions from the main memory.
The ‘problem’ with this is that modern processors are extremely powerful (capable of executing billions of instructions per second).
For instance, the AMD Ryzen 9 3950X has a base clock speed of 3.5 GHz (3.5 billion cycles per second). It can execute over a hundred instructions in a single clock cycle.
However, accessing data from the RAM may take hundreds of cycles. That is a lot of wasted cycles that the CPU is stalled for.
If the CPU had to access data from the RAM every time, that would create a significant bottleneck and cripple system performance. This is where the CPU cache comes into play.
The CPU analyzes access patterns to predict what data and instructions it’ll likely need next. Then, it moves them from the RAM to the CPU cache before they’re actually needed (this is called prefetching).
Depending on the level, accessing data from the CPU cache can be over a hundred times faster than doing so from the RAM. So, the CPU delay is significantly reduced.
L1 vs L2 vs L3 Cache
Current CPUs implement 3 levels of CPU cache to maximize performance. This allows them to hit the sweet spot for cache size, latency, and hit rate.
- L1 – fastest but smallest, per core (128 KB – 2 MB total)
- L2 – medium latency and capacity, can be per core or shared (256 KB – 32 MB total)
- L3 – slowest but largest, shared (1 MB – 128 MB total)
You can get the exact numbers for your CPU online or using system profiling tools like CPU-Z and HWiNFO.
On my Ryzen 7 5700G, you can see that it’s split into L1 Data and L1 Instructions. 32 KB of both caches is embedded into all 8 cores. This means the total L1 cache is 512 KB.
As the L1 cache is the smallest/fastest memory level, the CPU first checks whether the required data is in L1. If the data is present, it immediately reads from or writes to L1. This is called a cache hit.
Sometimes, the required data won’t be in L1. This is called a cache miss. In this case, the CPU checks the next fastest cache level i.e. L2.
The L2 cache is larger but slower compared to L1. It can be implemented per core, or as a shared pool. On the 5700G, it’s split 8-way (512 KB per core), which totals 4 MB.
If a cache miss occurs in L2, the CPU checks L3 next. This is the largest CPU cache level, but it also has the highest latency. For instance, the 5700G has a 16 MB L3 cache implemented as a shared pool.
If a cache miss occurs again, the CPU checks the RAM, and then the storage drive.
CPU Cache Levels Up Close
Before moving on, let’s see what the CPU cache levels look like on an actual CPU die to understand things better.
If you take apart a CPU and sand the bottom layer of the CPU die, you can expose the actual CPU circuits.
For instance, the bottom layer of an i9-13900K CPU die looks something like this:
Rotate the picture anti-clockwise to make the closeup horizontal. Then, compare it to this die-shot interpretation. You’ll see exactly how the different cache levels are implemented.
By checking the data from system profiling tools, you’ll have an even clearer idea of the CPU cache distribution.
In the i9-13900K’s case, you can see how the L1 and L2 caches are distributed across the P-cores and E-cores.
How Much CPU Cache Do You Need
The CPU cache is clearly important for CPU performance. But what does that mean for the end-user? Are CPUs with higher cache always better?
It all depends on what you’ll use the CPU for.
There are many factors to consider when choosing a CPU – clock speed, core count, CPU generation, architecture, TDP, cache, and so on. All of these are interlinked and determine the CPU performance together.
So, generally, it’s hard to single out one element like the cache, and attribute performance to that. But there are exceptions.
Take AMD’s X3D gaming CPUs, for instance. The Ryzen 5800X and 5800X3D are mostly similar. The only difference is a slightly lower clock speed but triple the L3 cache on the 5800X3D (32 MB vs 96 MB).
The benchmarks for these processors show that performance differs according to the workload.
- In synthetic benchmarks and productivity tasks like video editing, the extra cache makes no difference. This is because the L3 hit rate is already very high for sequential data.
- In fact, the slightly lower frequency means the 5800X3D may even perform worse.
- But the 5800X3D shines for tasks like gaming where the CPU needs to frequently access random data from L3.
- On average, the extra cache results in a 10-15% improvement in average FPS and a 20% or higher improvement in 1% lows. These are incredible results considering the only difference is a higher cache.
To reiterate, there’s no set number for the best cache amount. It can have virtually no impact or make a massive difference depending on the workload. So, it just depends on what you’ll use the CPU for.
Most consumer CPUs have a standard amount of CPU cache intended to work for most people. Whatever CPU you’re planning to get, check the benchmarks online and see how it performs in tasks that you’ll mostly use it for.
If there are similar options with higher or lower cache, check the benchmarks for them too. Then, decide which one will better fit your use cases.