Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non blocking cache #97

Closed
SoCScholar opened this issue May 12, 2024 · 6 comments
Closed

Non blocking cache #97

SoCScholar opened this issue May 12, 2024 · 6 comments

Comments

@SoCScholar
Copy link

SoCScholar commented May 12, 2024

Hi charles,

What is "hit-under-X-misses" does L1 and L2 data cache of NaXRISCV ? like A "hit-under-X-misses" cache will allow X number of misses to be outstanding in the cache before blocking. For example, a"hit-under-2-misses" cache will keep running if there are at most 2 misses that still not complete. If additional requests are hit, the cache will keep working. However, if a request is a third miss, the cache will block (stop receiving requests).

if 1 miss is equal one refill for L2 cache, is L2 cache wait until refill happen. like can we say L2 data cache is hit-under-1-miss. please correct me. thank you so much

@Dolu1990
Copy link
Member

Hi,

By default, NaxRiscv L1 data cache allow to have 2 inflight line refill, and 2 inflight line writeback.
Also by default, up to 16 load and 16 store can be inflight.
The cache will never block because there is too many miss / writeback, but may not be abble to start the refill / writeback process on a cache miss.

If more than 16 load or 16 store inflight are asked, the LSU will block the CPU decoding.

@SoCScholar
Copy link
Author

SoCScholar commented May 13, 2024

can we say increasing the capacity of the store buffer entries in NAXRISCV could potentially mitigate unnecessary cache line refills triggered by full stores.

I also wonder what might be potential problems and side effects on out-of-order execution due to reduced load entries buffer.

@Dolu1990
Copy link
Member

can we say increasing the capacity of the store buffer entries in NAXRISCV could potentially mitigate unnecessary cache line refills triggered by full stores.

No

I also wonder what might be potential problems and side effects on out-of-order execution due to reduced load entries buffer

This will just reduce performance, by not allowing the CPU to have a large out of order execution window, especialy on mem copy where it won't be able to work on multiple cache line

@cklarhorst
Copy link
Contributor

Just my two cents and Dolu please correct me if I'm wrong.
Optimal sq/lq size also depends on 32 vs 64bit naxriscv.
Assuming the default cache line size of 64 bytes.

For the 64bit version, a fully utilized 16 sqSize spans 2 cachelines: 8byte * 16 accesses = 128bytes
But for the 32bit version, this only utilizes one cacheline: 4byte * 16 accesses = 64bytes.

So, in my opinion, for 32bit one could increase sqSize or decrease the cacheline size.

Quick litex memtest benchmark (I'm not on the latest main / nax: ec3ee4d):
With lq/sq=16:

litex_sim  --with-sdram --cpu-type naxriscv --sdram-data-width 64 --update-repo no --no-netlist-cache
  Write speed: 3.0MiB/s
   Read speed: 3.4MiB/s

litex_sim  --with-sdram --cpu-type naxriscv --sdram-data-width 64 --update-repo no --no-netlist-cache --l2-bytes 0
  Write speed: 2.5MiB/s
   Read speed: 3.2MiB/s

With lq/sq=32:

litex_sim  --with-sdram --cpu-type naxriscv --sdram-data-width 64 --update-repo no --no-netlist-cache
  Write speed: 3.5MiB/s
   Read speed: 3.4MiB/s

litex_sim  --with-sdram --cpu-type naxriscv --sdram-data-width 64 --update-repo no --no-netlist-cache --l2-bytes 0
  Write speed: 3.3MiB/s
   Read speed: 3.3MiB/s

YMMV depending on dram timing, l2-cache, xlen etc.

@SoCScholar
Copy link
Author

SoCScholar commented May 15, 2024

are there 16 load buffer entries for instruction cache, which helps in Parallelismen by re-ordering the instruction in load buffers. is it like this ?

A more load entries buffer may help the CPU’s ability to prefetch and work on multiple cache lines simultaneously, potentially leading to more cache hit leading to more cache line utilization.

Actually i didn't understand where are these Load and store buffer located ? Is there is separate load and store queue for data cache and instruction cache.

Also i am kind of curious how load and store buffers interact with LSU unit (Load-Store Unit)

image

please correct me if i am wrong. thank you so much

@Dolu1990
Copy link
Member

are there 16 load buffer entries for instruction cache,

instruction cache ?

There is only one AGU in nax

where are these Load and store buffer located

In the LSU

Also i am kind of curious how load and store buffers interact with LSU unit (Load-Store Unit)

The LSU is mostly made of those load/store buffers.
They provide a poll of load/store for the LSU to work on.

potentially leading to more cache hit

Not realy more cache hit. But instead it will allow the CPU to eventualy start refilling cache lines earlier, hiding cache miss penality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants