From 8b812f981ef9f7706bd71d9dbe9afb92f28b327e Mon Sep 17 00:00:00 2001 From: Charles Papon Date: Sat, 30 Dec 2023 16:39:58 +0100 Subject: [PATCH 1/2] miaou --- source/VexiiRiscv/Decode/index.rst | 58 +++++ source/VexiiRiscv/Execute/index.rst | 153 +++++++++++ source/VexiiRiscv/Fetch/index.rst | 72 ++++++ source/VexiiRiscv/Framework/index.rst | 243 ++++++++++++++++++ source/VexiiRiscv/Introduction/VexiiRiscv.rst | 18 ++ source/index.rst | 4 + 6 files changed, 548 insertions(+) create mode 100644 source/VexiiRiscv/Decode/index.rst create mode 100644 source/VexiiRiscv/Execute/index.rst create mode 100644 source/VexiiRiscv/Fetch/index.rst create mode 100644 source/VexiiRiscv/Framework/index.rst diff --git a/source/VexiiRiscv/Decode/index.rst b/source/VexiiRiscv/Decode/index.rst new file mode 100644 index 0000000..6996de0 --- /dev/null +++ b/source/VexiiRiscv/Decode/index.rst @@ -0,0 +1,58 @@ +Decode +============ + +A few plugins operate in the fetch stage : + +- DecodePipelinePlugin +- AlignerPlugin +- DecoderPlugin +- DispatchPlugin +- DecodePredictionPlugin + + +DecodePipelinePlugin +------------------------- + +Provide the pipeline framework for all the decode related hardware. +It use the spinal.lib.misc.pipeline API but implement multiple "lanes" in it. + + +AlignerPlugin +------------------------- + +Decode the words froms the fetch pipeline into aligned instructions in the decode pipeline + +DecoderPlugin +------------------------- + +Will : + +- Decode instruction +- Generate ilegal instruction exception +- Generate "interrupt" instruction + +DecodePredictionPlugin +------------------------- + +The purpose of this plugin is to ensure that no branch/jump prediction was made for non branch/jump instructions. +In case this is detected, the plugin will just flush the pipeline and set the fetch PC to redo everything, but this time with a "first prediction skip" + +DispatchPlugin +------------------------- + +Will : + +- Collect instruction from the end of the decode pipeline +- Try to dispatch them ASAP on the multiple "layers" available + +Here is a few explenation about execute lanes and layers : + +- A execute lane represent a path toward which an instruction can be executed. +- A execute lane can have one or many layers, which can be used to implement things as early ALU / late ALU +- Each layer will have static a scheduling priority + +The DispatchPlugin doesn't require lanes or layers to be symetric in any way. + + + + diff --git a/source/VexiiRiscv/Execute/index.rst b/source/VexiiRiscv/Execute/index.rst new file mode 100644 index 0000000..fd2145b --- /dev/null +++ b/source/VexiiRiscv/Execute/index.rst @@ -0,0 +1,153 @@ +Execute +============ + +Many plugins operate in the fetch stage. Some provide infrastructures : + +- ExecutePipelinePlugin +- ExecuteLanePlugin +- RegFilePlugin +- SrcPlugin +- RsUnsignedPlugin +- IntFormatPlugin +- WriteBackPlugin +- LearnPlugin + +Some implement regular instructions + +- IntAluPlugin +- BarrelShifterPlugin +- BranchPlugin +- MulPlugin +- DivPlugin +- LsuCachelessPlugin + +Some implement CSR, privileges and special instructions + +- CsrAccessPlugin +- CsrRamPlugin +- PrivilegedPlugin +- PerformanceCounterPlugin +- EnvPlugin + + +ExecutePipelinePlugin +----------------------- + +Provide the pipeline framework for all the execute related hardware with the following specificities : + +- It is based on the spinal.lib.misc.pipeline API and can host multiple "lanes" in it. +- For flow control, the lanes can only freeze the whole pipeline +- The pipeline do not collapse bubbles (empty stages) + + +ExecuteLanePlugin +----------------------- + +Implement an execution lane in the ExecutePipelinePlugin + +RegFilePlugin +----------------------- + +Implement one register file, with the possibility to create new read / write port on demande + +SrcPlugin +----------------------- + +Provide some early integer values which can mux between RS1/RS2 and multiple RISC-V instruction's literal values + +RsUnsignedPlugin +----------------------- + +Used by mul/div in order to get an unsigned RS1/RS2 value early in the pipeline + +IntFormatPlugin +----------------------- + +Alows plugins to write integer values back to the register file through a optional sign extender. +It uses WriteBackPlugin as value backend. + +WriteBackPlugin +----------------------- + +Used by plugins to provide the RD value to write back to the register file + +LearnPlugin +---------------- + +Will collect all interface which provide jump/branch learning interfaces to aggregate them into a single one, which will then be used by branch prediction plugins to learn. + +IntAluPlugin +----------------------- + +Implement the arithmetic, binary and literal instructions (ADD, SUB, AND, OR, LUI, ...) + +BarrelShifterPlugin +----------------------- + +Implement the shift instructions in a non-blocking way (no iterations). Fast but "heavy". + +BranchPlugin +----------------------- + +Will : + +- Implement branch/jump instruction +- Correct the PC / History in the case the branch prediction was wrong +- Provide a learn interface to the LearnPlugin + + +MulPlugin +----------------------- + +- Implement multiplication operation using partial multiplications and then summing their result +- Done over multiple stage +- Can optionaly extends the last stage for one cycle in order to buffer the MULH bits + +DivPlugin +----------------------- + +- Implement the division/remain +- 2 bits per cycle are solved. +- When it start, it scan for the numerator leading bits for 0, and can skip dividing them (can skip blocks of XLEN/4) + +LsuCachelessPlugin +----------------------- + +- Implement load / store through a cacheless memory bus +- Will fork the cmd as soon as fork stage is valid (with no flush) +- Handle backpresure by using a little fifo on the response data + +CsrAccessPlugin +----------------------- + +- Implement the CSR instruction +- Provide an API for other plugins to specify its hardware mapping + +CsrRamPlugin +----------------------- + +- Implement a shared on chip ram +- Provide an API which allows to staticaly allocate space on it +- Provide an API to create read / write ports on it +- Used by various plugins to store the CSR contents in a FPGA efficient way + +PrivilegedPlugin +----------------------- + +- Implement the RISCV privileged spec +- Implement the trap buffer / FSM +- Use the CsrRamPlugin to implement various CSR as MTVAL, MTVEC, MEPC, MSCRATCH, ... + +PerformanceCounterPlugin +----------------------- + +- Implement the privileged performance counters in a very FPGA way +- Use the CsrRamPlugin to store most of the counter bits +- Use a dedicated 7 bits hardware register per counter +- Once that 7 bits register MSB is set, a FSM will flush it into the CsrRamPlugin + + +EnvPlugin +------------------------ + +- Implement a few instructions as MRET, SRET, ECALL, EBREAK diff --git a/source/VexiiRiscv/Fetch/index.rst b/source/VexiiRiscv/Fetch/index.rst new file mode 100644 index 0000000..28fee86 --- /dev/null +++ b/source/VexiiRiscv/Fetch/index.rst @@ -0,0 +1,72 @@ +Fetch +============ + +A few plugins operate in the fetch stage : + +- FetchPipelinePlugin +- PcPlugin +- FetchCachelessPlugin +- BtbPlugin +- GSharePlugin +- HistoryPlugin + +FetchPipelinePlugin +------------------------- + +Provide the pipeline framework for all the fetch related hardware. It use the native spinal.lib.misc.pipeline API without any restriction. + +PcPlugin +------------------------- + +Will : + +- implement the fetch program counter register +- inject the program counter in the first fetch stage +- allow other plugin to create "jump" interface allowing to override the PC value + +Jump interfaces will impact the PC value injected in the fetch stage in a combinatorial manner to reduce latency. + +FetchCachelessPlugin +------------------------- + +Will : + +- Generate a fetch memory bus +- Connect that memory bus to the fetch pipeline with a response buffer +- Allow out of order memory bus responses (for maximal compatibility) +- Always generate aligned memory accesses + + +BtbPlugin +------------------------- + +Will : + +- Implement a branch target buffer in the fetch pipeline +- Implement a return address stack buffer +- Predict which slices of the fetched word are the last slice of a branch/jump +- Predict the branch/ĵump target +- Use the FetchConditionalPrediction plugin (GSharePlugin) to know if branch should be taken +- Apply the prediction (flush + pc update + history update) +- Learn using the LearnPlugin interface +- Implement "ways" named chunks which are staticaly assigned to groups of word's slices, allowing to predict multiple branch/jump present in the same word + +GSharePlugin +------------------------- + +Will : + +- Implement a FetchConditionalPrediction (GShare flavor) +- Learn using the LearnPlugin interface +- Will not apply the prediction via flush / pc change, another plugin will do that + +HistoryPlugin +------------------------- + +Will : + +- implement the branch history register +- inject the branch history in the first fetch stage +- allow other plugin to create interface to override the branch history value (on branch prediction / execution) + +branch history interfaces will impact the branch history value injected in the fetch stage in a combinatorial manner to reduce latency. diff --git a/source/VexiiRiscv/Framework/index.rst b/source/VexiiRiscv/Framework/index.rst new file mode 100644 index 0000000..f849fcb --- /dev/null +++ b/source/VexiiRiscv/Framework/index.rst @@ -0,0 +1,243 @@ +Framework +============ + + +Dependencies +------------------------------ + +VexRiscv is based on a few tools / API + +- Scala : Which will take care of the elaboration +- SpinalHDL : Which provide a hardware description API +- Plugin : Which are used to inject hardware in the CPU +- Fiber : Which allows to define elaboration threads in the plugins +- Retainer : Which allows to block the execution of the elaboration threads waiting on it +- Database : Which specify a shared scope for all the plugins to share elaboration time stuff +- spinal.lib.misc.pipeline : Which allow to pipeline things in a very dynamic manner. +- spinal.lib.logic : Which provide Quine McCluskey to generate logic decoders from the elaboration time specifications + + +Scala / SpinalHDL +------------------------------ + +This combination alows to goes way behond what regular HDL alows in terms of hardware description capabilities. +You can find some documentation about SpinalHDL here : + +- https://spinalhdl.github.io/SpinalDoc-RTD/master/index.html + +Plugin +------------------------- + +One main design aspect of VexiiRiscv is that all its hardware is defined inside plugins. When you want to instanciate a VexiiRiscv CPU, you "only" need to provide a list of plugins as parameters. So, plugins can be seen as both parameters and hardware definition from a VexiiRiscv perspective. + +So it is quite different from the regular HDL component/module paradigm. Here are the adventages of this aproache : + +- The CPU can be extended without modifying its core source code, just add a new plugin in the parameters +- You can swap a specific implementation for another just by swapping plugin in the parameter list. (ex branch prediction, mul/div, ...) +- It is decentralised by nature, you don't have a fat toplevel of doom, software interface between plugins can be used to negociate things durring elaboration time. + +The plugins can fork elaboration threads which cover 2 phases : + +- setup phase : where plugins can aquire elaboration locks on each others +- build phase : where plugins can negociate between each others and generate hardware + +Simple all-in-one example +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Here is a simple example : + +.. code-block:: scala + + import spinal.core._ + import spinal.lib.misc.plugin._ + import vexiiriscv._ + import scala.collection.mutable.ArrayBuffer + + // Define a new plugin kind + class FixedOutputPlugin extends FiberPlugin{ + // Define a build phase elaboration thread + val logic = during build new Area{ + val port = out UInt(8 bits) + port := 42 + } + } + + object Gen extends App{ + // Generate the verilog + SpinalVerilog{ + val plugins = ArrayBuffer[FiberPlugin]() + plugins += new FixedOutputPlugin() + VexiiRiscv(plugins) + } + } + + +Will generate + +.. code-block:: verilog + + module VexiiRiscv ( + output wire [7:0] FixedOutputPlugin_logic_port + ); + + assign FixedOutputPlugin_logic_port = 8'h42; + + endmodule + + + +Negociation example +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Here is a example where there a plugin which count the number of hardware event comming from other plugins : + +.. code-block:: scala + + import spinal.core._ + import spinal.core.fiber.Retainer + import spinal.lib.misc.plugin._ + import spinal.lib.CountOne + import vexiiriscv._ + import scala.collection.mutable.ArrayBuffer + + class EventCounterPlugin extends FiberPlugin{ + val lock = Retainer() // Will allow other plugins to block the elaboration of "logic" thread + val events = ArrayBuffer[Bool]() // Will allow other plugins to add event sources + val logic = during build new Area{ + lock.await() // Active blocking + val counter = Reg(UInt(32 bits)) init(0) + counter := counter + CountOne(events) + } + } + + + //For the demo we want to be able to instanciate this plugin multiple times, so we add a prefix parameter + class EventSourcePlugin(prefix : String) extends FiberPlugin{ + withPrefix(prefix) + + // Create a thread starting from the setup phase (this allow to run some code before the build phase, and so lock some other plugins retainers) + val logic = during setup new Area{ + val ecp = host[EventCounterPlugin] // Search for the single instance of EventCounterPlugin in the plugin pool + // Generate a lock to prevent the EventCounterPlugin elaboration until we release it. + // this will allow us to add our localEvent to the ecp.events list + val ecpLocker = ecp.lock() + + // Wait for the build phase before generating any hardware + awaitBuild() + + // Here the local event is a input of the VexiiRiscv toplevel (just for the demo) + val localEvent = in Bool() + ecp.events += localEvent + + // As everything is done, we now allow the ecp to elaborate itself + ecpLocker.release() + } + } + + object Gen extends App{ + SpinalVerilog{ + val plugins = ArrayBuffer[FiberPlugin]() + plugins += new EventCounterPlugin() + plugins += new EventSourcePlugin("lane0") + plugins += new EventSourcePlugin("lane1") + VexiiRiscv(plugins) + } + } + +.. code-block:: verilog + + module VexiiRiscv ( + input wire lane0_EventSourcePlugin_logic_localEvent, + input wire lane1_EventSourcePlugin_logic_localEvent, + input wire clk, + input wire reset + ); + + wire [31:0] _zz_EventCounterPlugin_logic_counter; + reg [1:0] _zz_EventCounterPlugin_logic_counter_1; + wire [1:0] _zz_EventCounterPlugin_logic_counter_2; + reg [31:0] EventCounterPlugin_logic_counter; + + assign _zz_EventCounterPlugin_logic_counter = {30'd0, _zz_EventCounterPlugin_logic_counter_1}; + assign _zz_EventCounterPlugin_logic_counter_2 = {lane1_EventSourcePlugin_logic_localEvent,lane0_EventSourcePlugin_logic_localEvent}; + always @(*) begin + case(_zz_EventCounterPlugin_logic_counter_2) + 2'b00 : _zz_EventCounterPlugin_logic_counter_1 = 2'b00; + 2'b01 : _zz_EventCounterPlugin_logic_counter_1 = 2'b01; + 2'b10 : _zz_EventCounterPlugin_logic_counter_1 = 2'b01; + default : _zz_EventCounterPlugin_logic_counter_1 = 2'b10; + endcase + end + + always @(posedge clk or posedge reset) begin + if(reset) begin + EventCounterPlugin_logic_counter <= 32'h00000000; + end else begin + EventCounterPlugin_logic_counter <= (EventCounterPlugin_logic_counter + _zz_EventCounterPlugin_logic_counter); + end + end + + + endmodule + + +Database +-------------------- + +Quite a few things behave kinda like variable specific for each VexiiRiscv instance. For instance XLEN, PC_WIDTH, INSTRUCTION_WIDTH, ... + +So they are end up with things that we would like to share between plugins of a given VexiiRiscv instance with the minimum code possible to keep things slim. For that, a "database" was added. +You can see it in the VexRiscv toplevel : + +.. code-block:: scala + + class VexiiRiscv extends Component{ + val database = new Database + val host = database on (new PluginHost) + } + +What it does is that all the plugin thread will run in the context of that database. Allowing the following patterns : + +.. code-block:: scala + + import spinal.core._ + import spinal.lib.misc.plugin._ + import spinal.lib.misc.database.Database.blocking + import vexiiriscv._ + import scala.collection.mutable.ArrayBuffer + + object Global extends AreaObject{ + val VIRTUAL_WIDTH = blocking[Int] // If accessed while before being set, it will actively block (until set by another thread) + } + + class LoadStorePlugin extends FiberPlugin{ + val logic = during build new Area{ + val register = Reg(UInt(Global.VIRTUAL_WIDTH bits)) + } + } + + class MmuPlugin extends FiberPlugin{ + val logic = during build new Area{ + Global.VIRTUAL_WIDTH.set(39) + } + } + + object Gen extends App{ + SpinalVerilog{ + val plugins = ArrayBuffer[FiberPlugin]() + plugins += new LoadStorePlugin() + plugins += new MmuPlugin() + VexiiRiscv(plugins) + } + } + +Pipeline API +-------------------- + +In short the design use a pipeline API in order to : + +- Allow moving things around with no paine (retiming) +- Reduce boiler plate code + +More documentation about it in https://github.com/SpinalHDL/SpinalDoc-RTD/pull/226 + diff --git a/source/VexiiRiscv/Introduction/VexiiRiscv.rst b/source/VexiiRiscv/Introduction/VexiiRiscv.rst index 65fae0d..68b6251 100644 --- a/source/VexiiRiscv/Introduction/VexiiRiscv.rst +++ b/source/VexiiRiscv/Introduction/VexiiRiscv.rst @@ -3,3 +3,21 @@ About VexiiRiscv ------------------------------ +VexiiRiscv is a from scratch second iteration of VexRiscv, with the following goals : + +- RISCV 32/64 bits IMAFDC +- Could start around as small as VexRiscv, but could scale further in performance +- Optional late-alu +- Optional multi issue +- Optional multi threading +- Providing a cleaner implementation, getting ride of the technical debt, especially the frontend +- Proper branch prediction +- ... + +On this date (29/12/2023) the status is : + +- rv 32/64 im supported +- Can run baremetal benchmarks (2.24 dhrystone/mhz, 4.62 coremark/mhz) +- single/dual issue supported +- late-alu supported +- BTB/RAS/GShare branch prediction supported diff --git a/source/index.rst b/source/index.rst index fb67d2d..3329b52 100644 --- a/source/index.rst +++ b/source/index.rst @@ -8,3 +8,7 @@ Welcome to VexiiRiscv's documentation! :maxdepth: 1 VexiiRiscv/Introduction/index + VexiiRiscv/Framework/index + VexiiRiscv/Fetch/index + VexiiRiscv/Decode/index + VexiiRiscv/Execute/index From bfd1604586e104416c89fc57ecb72f1e0574b6cf Mon Sep 17 00:00:00 2001 From: Dolu1990 Date: Sun, 31 Dec 2023 11:15:05 +0100 Subject: [PATCH 2/2] fix --- source/VexiiRiscv/Execute/index.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/source/VexiiRiscv/Execute/index.rst b/source/VexiiRiscv/Execute/index.rst index fd2145b..14e066c 100644 --- a/source/VexiiRiscv/Execute/index.rst +++ b/source/VexiiRiscv/Execute/index.rst @@ -139,7 +139,7 @@ PrivilegedPlugin - Use the CsrRamPlugin to implement various CSR as MTVAL, MTVEC, MEPC, MSCRATCH, ... PerformanceCounterPlugin ------------------------ +-------------------------------- - Implement the privileged performance counters in a very FPGA way - Use the CsrRamPlugin to store most of the counter bits