Non blocking cache #97

SoCScholar · 2024-05-12T00:34:01Z

Hi charles,

What is "hit-under-X-misses" does L1 and L2 data cache of NaXRISCV ? like A "hit-under-X-misses" cache will allow X number of misses to be outstanding in the cache before blocking. For example, a"hit-under-2-misses" cache will keep running if there are at most 2 misses that still not complete. If additional requests are hit, the cache will keep working. However, if a request is a third miss, the cache will block (stop receiving requests).

if 1 miss is equal one refill for L2 cache, is L2 cache wait until refill happen. like can we say L2 data cache is hit-under-1-miss. please correct me. thank you so much

Dolu1990 · 2024-05-13T08:32:36Z

Hi,

By default, NaxRiscv L1 data cache allow to have 2 inflight line refill, and 2 inflight line writeback.
Also by default, up to 16 load and 16 store can be inflight.
The cache will never block because there is too many miss / writeback, but may not be abble to start the refill / writeback process on a cache miss.

If more than 16 load or 16 store inflight are asked, the LSU will block the CPU decoding.

SoCScholar · 2024-05-13T10:15:28Z

can we say increasing the capacity of the store buffer entries in NAXRISCV could potentially mitigate unnecessary cache line refills triggered by full stores.

I also wonder what might be potential problems and side effects on out-of-order execution due to reduced load entries buffer.

Dolu1990 · 2024-05-13T10:33:23Z

can we say increasing the capacity of the store buffer entries in NAXRISCV could potentially mitigate unnecessary cache line refills triggered by full stores.

No

I also wonder what might be potential problems and side effects on out-of-order execution due to reduced load entries buffer

This will just reduce performance, by not allowing the CPU to have a large out of order execution window, especialy on mem copy where it won't be able to work on multiple cache line

cklarhorst · 2024-05-13T18:41:05Z

Just my two cents and Dolu please correct me if I'm wrong.
Optimal sq/lq size also depends on 32 vs 64bit naxriscv.
Assuming the default cache line size of 64 bytes.

For the 64bit version, a fully utilized 16 sqSize spans 2 cachelines: 8byte * 16 accesses = 128bytes
But for the 32bit version, this only utilizes one cacheline: 4byte * 16 accesses = 64bytes.

So, in my opinion, for 32bit one could increase sqSize or decrease the cacheline size.

Quick litex memtest benchmark (I'm not on the latest main / nax: ec3ee4d):
With lq/sq=16:

litex_sim  --with-sdram --cpu-type naxriscv --sdram-data-width 64 --update-repo no --no-netlist-cache
  Write speed: 3.0MiB/s
   Read speed: 3.4MiB/s

litex_sim  --with-sdram --cpu-type naxriscv --sdram-data-width 64 --update-repo no --no-netlist-cache --l2-bytes 0
  Write speed: 2.5MiB/s
   Read speed: 3.2MiB/s

With lq/sq=32:

litex_sim  --with-sdram --cpu-type naxriscv --sdram-data-width 64 --update-repo no --no-netlist-cache
  Write speed: 3.5MiB/s
   Read speed: 3.4MiB/s

litex_sim  --with-sdram --cpu-type naxriscv --sdram-data-width 64 --update-repo no --no-netlist-cache --l2-bytes 0
  Write speed: 3.3MiB/s
   Read speed: 3.3MiB/s

YMMV depending on dram timing, l2-cache, xlen etc.

SoCScholar · 2024-05-15T22:35:32Z

are there 16 load buffer entries for instruction cache, which helps in Parallelismen by re-ordering the instruction in load buffers. is it like this ?

A more load entries buffer may help the CPU’s ability to prefetch and work on multiple cache lines simultaneously, potentially leading to more cache hit leading to more cache line utilization.

Actually i didn't understand where are these Load and store buffer located ? Is there is separate load and store queue for data cache and instruction cache.

Also i am kind of curious how load and store buffers interact with LSU unit (Load-Store Unit)

please correct me if i am wrong. thank you so much

Dolu1990 · 2024-05-16T12:10:36Z

are there 16 load buffer entries for instruction cache,

instruction cache ?

There is only one AGU in nax

where are these Load and store buffer located

In the LSU

Also i am kind of curious how load and store buffers interact with LSU unit (Load-Store Unit)

The LSU is mostly made of those load/store buffers.
They provide a poll of load/store for the LSU to work on.

potentially leading to more cache hit

Not realy more cache hit. But instead it will allow the CPU to eventualy start refilling cache lines earlier, hiding cache miss penality.

SoCScholar closed this as completed Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non blocking cache #97

Non blocking cache #97

SoCScholar commented May 12, 2024 •

edited

Loading

Dolu1990 commented May 13, 2024

SoCScholar commented May 13, 2024 •

edited

Loading

Dolu1990 commented May 13, 2024

cklarhorst commented May 13, 2024

SoCScholar commented May 15, 2024 •

edited

Loading

Dolu1990 commented May 16, 2024

Non blocking cache #97

Non blocking cache #97

Comments

SoCScholar commented May 12, 2024 • edited Loading

Dolu1990 commented May 13, 2024

SoCScholar commented May 13, 2024 • edited Loading

Dolu1990 commented May 13, 2024

cklarhorst commented May 13, 2024

SoCScholar commented May 15, 2024 • edited Loading

Dolu1990 commented May 16, 2024

SoCScholar commented May 12, 2024 •

edited

Loading

SoCScholar commented May 13, 2024 •

edited

Loading

SoCScholar commented May 15, 2024 •

edited

Loading