cache: a safe place for hiding or storing things.
--webster
the topic of today's and friday's lecture will be caches

caches

why have caches?

CPU is faster than memory.

speeds have been diverging (e.g. CPU has been getting faster at a faster rate, while speed of memory has fallen behind significantly).

there are techniques to slightly speed up memory access (such as interleaving by having several parallel buses at the cost of great complexity in the circuitboard), but these techniques only increase speed linearly, not geometrically, as is needed to keep pace with CPU speeds.

program locality

caches exploit program and data locality.

locality: program spends a large amount of time in blocks of relatively small size.

cache performance

cache performance is best measured in terms of "hit rate" or "miss rate", rather than in terms of simply the size of the cache.

knowing the size of the cache alone will not be an accurate indicator of its performance.

performance also depends on how the cache is mapped, etc..

of course, hit rate and miss rate depend on the specific program being run, and such measures must be determined in the context of specific benchmarks (e.g. specific program examples).

the 4 fundamental concepts of caches

block placement: where can a block be placed in a cache?
block identification: how is a block found if it is in cache?
block replacement: which block should be replaced in a cache miss?
write strategy: what happens on a write?

1: block placement

in terms of where a block can be placed from memory, there are three kinds of caches, each with its own method of mapping, as follows:

direct mapping

in this example, there are 20 memory block locations and 8 cache block locations (e.g. size of cache is only 1/4 the size of memory).
with direct mapping, the 20 memory locations are mapped to the 8 cache locations according to a MOD 8 mapping.
for example, memory location C gets mapped to cache location C MOD 8 = 4.
if, for example, memory block location 4, 14, or 1C now needed to be cached, there would be a conflict, even though the cache is only 1/8 full.
in most PCs L2 cache is direct mapped, if there is any L2 cache.
set associative mapping

suppose, for example, there are 4 sets.
memory location C can be assigned to any cache location in set C MOD 4 = set 0.
thus memory location C can be mapped to either of the two cache locations (location 0 or location 1) in set 0.
this is much better than direct mapping.
however, there are still drawbacks.
for example, if we need to cache memory location 8, after memory locations C and 4 have been cached, we will have a conflict, even though the cache is only 1/4 full.
on a 486, L1 cache is typically set associative, 8k of cache in 16 byte blocks.
in this lecture, the focus is on L1 cache. L1 cache is tied to the CPU, while L2 cache is tied to main memory (main store RAM)
fully associative mapping

memory location C can be stored in any location within the cache.