Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to add VPU to NaxRiscv? #10

Closed
franktaTian opened this issue Jul 4, 2022 · 29 comments
Closed

How to add VPU to NaxRiscv? #10

franktaTian opened this issue Jul 4, 2022 · 29 comments

Comments

@franktaTian
Copy link

I am trying to ADD VPU to NaxRiscv.

@Dolu1990
Copy link
Member

Dolu1990 commented Jul 4, 2022

Hi ^^

I guess the main chalange will be how the memory load / store would be made and keept coherent with the l1 cache.

One thing which may interrest you is the way in which the FPU support was added.

Do you have some design idea ?

@franktaTian
Copy link
Author

For VPU as ARA did,(https://github.com/pulp-platform/ara), it access the memory independantly.The RV is a "feeder", fetch,decode,commit(and feed data to VPU if neede ),wait respose, retire(and get data if needed).In most case, VPU just get instruction and give "retire" response immediately.
VPU does not keep coherent with l1 cache .
I think the framework of FPU in VexRiscv is fine.: FPU keep the FRegister and write back to Iregsiter if neede.But FPU in Nax is too cpmplex to rewrite. Can I rewrite FPU part of VexRiscv and add it in Nax?

@franktaTian
Copy link
Author

image

@franktaTian
Copy link
Author

It is interesting.

@Dolu1990
Copy link
Member

Dolu1990 commented Jul 8, 2022

Hi ^^

Sorry for the delay.

it access the memory independantly
I think that is a good aproache.

For the FPU i think the complexity for the NaxRiscv one come from the fact that it handle the out of hander execution. So if i understand well you aren't realy looking at handeling out of order execution for the VPU execution / VPU load&store ? Also what's about register renaming and depedencies tracking ?
I don't have much opinon on the subject as i don't know so much about the VPU.

@franktaTian
Copy link
Author

franktaTian commented Jul 12, 2022

Hi
As in case of ARA , it is an independant Process Unit , it has RF、Scoreboard and scheduler self. The interface from the Core side is just "decode"/recognize Vector instr 、feed it VPU and wait for retire.Yes, when the Core recognize the Vector instruction , it just stall to wait for response. The ARA vpu has its own FIFO the receive Intstrs. and in most cases VPU retire instr. as soon as it get the instr and if needed the scala Register data. ,just in one case : Load VRF register(in VPU) to scala Register.
So, I think , first of all, I want to find a way to let Naxy to decode Vector Instr. ,feed it to VPU with RSs(if needed) and stall to wait retiring response, retire.
Can you give a "Framework " ? Thanks so much. The FPU code is too complex.
But as just mentioned early, I think the FPU structure and code in Vex is possible to be modified to add VPU.
I think First Step is to let the main pipeline run in oder when it find Vector instructon. Just as FENCE .

@Dolu1990
Copy link
Member

I think First Step is to let the main pipeline run in oder when it find Vector instructon. Just as FENCE .

Hmm, and sometime i guess you have some integer values from/to the int regfile right ?
So one thing is that instruction without "registred" depedencies" will always find their way to the execution unit in order (older instruction first)
So could reuse the shared EU0

@franktaTian
Copy link
Author

OK,I'll try it .Thanks.

@Dolu1990
Copy link
Member

The main tricky things will be interraction between the VPU and the integer register file.

Do you know the kind of interractions it does ? For instance, in the FPU we have load/store addresses, float <> int convertion and cast.

@franktaTian
Copy link
Author

VPU read/write memory independantly and has its own register file (for example 32 lines X 1024 bits).It share FP opode for load/store and need integer register as base address. And there are also four instructions to move interger/float register to/from vector register.

@franktaTian
Copy link
Author

franktaTian commented Jul 21, 2022

What the interface needed is {instruction , rs1,rs2} and VPU will return rd if needed.
I added an simple interface in SimAddPlungin as:
case class VpuCmd() extends Bundle{
val OPCODE = UInt(32 bits)
val rs1 = UInt(32 bits)
val rs2 = UInt(32 bits)
}

case class VpuRsp() extends Bundle{
val rd = UInt(32 bits)
val error = Bool()
}

case class VpuBus() extends Bundle with IMasterSlave {
val cmd = Stream(VpuCmd())
val rsp = Stream(VpuRsp())

override def asMaster() = {
master(cmd)
slave(rsp)
}
}
And tried to export data to this interface as :
vpu.cmd.valid := True // ToDo
vpu.cmd.rs1 := rs1
vpu.cmd.rs2 := rs2
val instruction = Frontend.MICRO_OP().asUInt
vpu.cmd.OPCODE := instruction
vpu.rsp.ready := True
wb.payload := rd.asBits

@franktaTian
Copy link
Author

But , when I build it ,Get elaborate erroor:

[info] **********************************************************************************************
[info] [Warning] Elaboration failed (3 errors).
[info] Spinal will restart with scala trace to help you to find the problem.
[info] **********************************************************************************************
[info] [Progress] at 2.362 : Elaborate components
[info] [Progress] at 2.989 : Checks and transforms
[error] Exception in thread "main" spinal.core.SpinalExit:
[error] Error detected in phase PhaseCheck_noLatchNoOverride
[error] ********************************************************************************
[error] ********************************************************************************
[error] NO DRIVER ON (toplevel/??? : Bits[32 bits]), defined at
[error] naxriscv.Frontend$$anonfun$17.apply(Parameters.scala:95)
[error] naxriscv.Frontend$$anonfun$17.apply(Parameters.scala:95)
[error] naxriscv.execute.SimdAddPlugin$$anonfun$5$$anon$1$$anon$3.(SimdAddPlugin.scala:99)
[error] naxriscv.execute.SimdAddPlugin$$anonfun$5$$anon$1.(SimdAddPlugin.scala:80)
[error] naxriscv.execute.SimdAddPlugin$$anonfun$5.apply(SimdAddPlugin.scala:77)
[error] naxriscv.execute.SimdAddPlugin$$anonfun$5.apply(SimdAddPlugin.scala:77)
[error] naxriscv.utilities.Plugin$$anon$1$$anonfun$late$1$$anonfun$5.apply(Framework.scala:54)
[error] naxriscv.utilities.Framework.rework(Framework.scala:77)
[error] naxriscv.utilities.Plugin$$anon$1$$anonfun$late$1.apply(Framework.scala:52)
[error] spinal.sim.JvmThread.run(SimManager.scala:51)
[error] ********************************************************************************
[error] ********************************************************************************
[error] NO DRIVER ON (toplevel/??? : Bits[32 bits]), defined at
[error] naxriscv.Frontend$$anonfun$17.apply(Parameters.scala:95)
[error] naxriscv.Frontend$$anonfun$17.apply(Parameters.scala:95)
[error] naxriscv.execute.SimdAddPlugin$$anonfun$5$$anon$1$$anon$3.(SimdAddPlugin.scala:99)
[error] naxriscv.execute.SimdAddPlugin$$anonfun$5$$anon$1.(SimdAddPlugin.scala:80)
[error] naxriscv.execute.SimdAddPlugin$$anonfun$5.apply(SimdAddPlugin.scala:77)
[error] naxriscv.execute.SimdAddPlugin$$anonfun$5.apply(SimdAddPlugin.scala:77)
[error] naxriscv.utilities.Plugin$$anon$1$$anonfun$late$1$$anonfun$5.apply(Framework.scala:54)
[error] naxriscv.utilities.Framework.rework(Framework.scala:77)
[error] naxriscv.utilities.Plugin$$anon$1$$anonfun$late$1.apply(Framework.scala:52)
[error] spinal.sim.JvmThread.run(SimManager.scala:51)
[error] ********************************************************************************
[error] ********************************************************************************
[error] Design's errors are listed above.
[error] SpinalHDL compiler exit stack :
[error] at spinal.core.SpinalExit$.apply(Misc.scala:424)
[error] at spinal.core.SpinalError$.apply(Misc.scala:479)
[error] at spinal.core.internals.PhaseContext.checkPendingErrors(Phase.scala:175)
[error] at spinal.core.internals.PhaseContext.doPhase(Phase.scala:191)
[error] at spinal.core.internals.SpinalVerilogBoot$$anonfun$singleShot$2$$anonfun$apply$135.apply(Phase.scala:2684)
[error] at spinal.core.internals.SpinalVerilogBoot$$anonfun$singleShot$2$$anonfun$apply$135.apply(Phase.scala:2682)
[error] at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
[error] at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
[error] at spinal.core.internals.SpinalVerilogBoot$$anonfun$singleShot$2.apply(Phase.scala:2682)
[error] at spinal.core.internals.SpinalVerilogBoot$$anonfun$singleShot$2.apply(Phase.scala:2618)
[error] at spinal.core.ScopeProperty$.sandbox(ScopeProperty.scala:69)
[error] at spinal.core.internals.SpinalVerilogBoot$.singleShot(Phase.scala:2618)
[error] at spinal.core.internals.SpinalVerilogBoot$.apply(Phase.scala:2613)
[error] at spinal.core.Spinal$.apply(Spinal.scala:388)
[error] at spinal.core.SpinalConfig.generateVerilog(Spinal.scala:170)
[error] at naxriscv.execute.SimdAddNaxGen$.delayedEndpoint$naxriscv$execute$SimdAddNaxGen$1(SimdAddPlugin.scala:128)
[error] at naxriscv.execute.SimdAddNaxGen$delayedInit$body.apply(SimdAddPlugin.scala:109)
[error] at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
[error] at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
[error] at scala.App$$anonfun$main$1.apply(App.scala:76)
[error] at scala.App$$anonfun$main$1.apply(App.scala:76)
[error] at scala.collection.immutable.List.foreach(List.scala:392)
[error] at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:35)
[error] at scala.App$class.main(App.scala:76)
[error] at naxriscv.execute.SimdAddNaxGen$.main(SimdAddPlugin.scala:109)
[error] at naxriscv.execute.SimdAddNaxGen.main(SimdAddPlugin.scala)
[error] Nonzero exit code returned from runner: 1
[error] (Compile / runMain) Nonzero exit code returned from runner: 1
[error] Total time: 11 s, completed Jul 21, 2022 10:33:46 AM

@franktaTian
Copy link
Author

Is there any way to get the instruction simply?
Thanks

@Dolu1990
Copy link
Member

So, one place where you could easily get instruction stream in order would be in the FrontendPlugin dispatch stage.

That's just before things are pushed in the issue queue for out of order execution. But this will be problematic for VPU instruction which interract with the integer/float register files.

Could you show the code of your Plugin ?

@franktaTian
Copy link
Author

package naxriscv.execute
import naxriscv.Global._
import spinal.core._
import spinal.lib._
import naxriscv._
import naxriscv.riscv._
import naxriscv.riscv.IntRegFile
import naxriscv.interfaces.{RD, RS1, RS2, RfResource}
import naxriscv.utilities.Plugin
import spinal.lib.pipeline.Stageable

//This plugin example will add a new instruction named SIMD_ADD which do the following :
//
//RD : Regfile Destination, RS : Regfile Source
//RD( 7 downto 0) = RS1( 7 downto 0) + RS2( 7 downto 0)
//RD(16 downto 8) = RS1(16 downto 8) + RS2(16 downto 8)
//RD(23 downto 16) = RS1(23 downto 16) + RS2(23 downto 16)
//RD(31 downto 24) = RS1(31 downto 24) + RS2(31 downto 24)
//
//Instruction encoding :
//0000000----------000-----0001011 <- Custom0 func3=0 func7=0
// |RS2||RS1| |RD |
//
//Note : RS1, RS2, RD positions follow the RISC-V spec and are common for all instruction of the ISA

case class VpuCmd() extends Bundle{
val OPCODE = UInt(32 bits)
val rs1 = UInt(Global.XLEN.get bits)
val rs2 = UInt(Global.XLEN.get bits)
}

case class VpuRsp() extends Bundle{
val rd = UInt(Global.XLEN.get bits)
val error = Bool()
}

case class VpuBus() extends Bundle with IMasterSlave {
val cmd = Stream(VpuCmd())
val rsp = Stream(VpuRsp())

override def asMaster() = {
master(cmd)
slave(rsp)
}
}

object SimdAddPlugin{
//Define the instruction type and encoding that we wll use
val ADD4 = IntRegFile.TypeR(M"0000000----------000-----0001011")
// val VADD4 = VecRegFile.TypeNone(M"0010010----------010-----1010111")
}

//ExecutionUnitElementSimple Is a base class which will be coupled to the pipeline provided by a ExecutionUnitBase with
//the same euId. It provide quite a few utilities to ease the implementation of custom instruction.
//Here we will implement a plugin which provide SIMD add on the register file.
//staticLatency=true specify that our plugin will never halt the pipeling, allowing the issue queue to statically
//wake up instruction which depend on its result.
class SimdAddPlugin(val euId : String) extends ExecutionUnitElementSimple(euId, staticLatency = true) {
//We will assume our plugin is fully combinatorial
override def euWritebackAt = 0

//The setup code is by plugins to specify things to each others before it is too late
//create early blockOfCode will
override val setup = create early new Setup{
//Let's assume we only support RV32 for now
assert(Global.XLEN.get == 32)

//Specify to the ExecutionUnitBase that the current plugin will implement the ADD4 instruction

// add(SimdAddPlugin.ADD4)
add(Rvv.VAADD_VV)

// add(SimdAddPlugin.VADD4)
}
val vpu = create early master(VpuBus())

override val logic = create late new Logic{

val process = new ExecuteArea(stageId = 0) {
  
  //Get the RISC-V RS1/RS2 values from the register file
  val rs1 = stage(eu(IntRegFile, RS1)).asUInt
  val rs2 = stage(eu(IntRegFile, RS2)).asUInt

  //Do some computation
  val rd = UInt(32 bits)
  rd := rs1 

// rd( 7 downto 0) := rs1( 7 downto 0) + rs2( 7 downto 0)
// rd(16 downto 8) := rs1(16 downto 8) + rs2(16 downto 8)
// rd(23 downto 16) := rs1(23 downto 16) + rs2(23 downto 16)
// rd(31 downto 24) := rs1(31 downto 24) + rs2(31 downto 24)

  //Provide the computation value for the writeback
  vpu.cmd.valid := True // ToDo
  vpu.cmd.rs1 := rs1
  vpu.cmd.rs2 := rs2

// val instruction = Frontend.MICRO_OP().asUInt
// vpu.cmd.OPCODE := instruction
vpu.cmd.OPCODE := rs1
vpu.rsp.ready := True
wb.payload := rd.asBits
}
}
}

object SimdAddNaxGen extends App{
import naxriscv.compatibility._
import naxriscv.utilities._

def plugins = {
val l = Config.plugins(
withRdTime = false,
aluCount = 2,
decodeCount = 2
)
l += new SimdAddPlugin("ALU0")
l += new SimdAddPlugin("ALU1")
l
}

val spinalConfig = SpinalConfig(inlineRom = true)
spinalConfig.addTransformationPhase(new MemReadDuringWriteHazardPhase)
spinalConfig.addTransformationPhase(new MultiPortWritesSymplifier)

val report = spinalConfig.generateVerilog(new NaxRiscv(plugins))
report.toplevel.framework.getService[DocPlugin].genC()
}

@franktaTian
Copy link
Author

I just modified SimdAddPlugin.scala and added one RVV instrution.

@franktaTian
Copy link
Author

And , define RVV instruction as :
package naxriscv.riscv
import spinal.core._
object Rvv {
import VecRegFile._
def VAADD_VV = TypeV(M"001001-----------010-----1010111")

@franktaTian
Copy link
Author

franktaTian commented Jul 22, 2022

object VecRegFile extends RegfileSpec with AreaObject {
override def sizeArch = 32
override def width = Global.XLEN
override def x0AlwaysZero = true
override def getName() = "vector"
def TypeNone(key : MaskedLiteral) = SingleDecoding(
key = key,
resources = Nil
)
def TypeV(key : MaskedLiteral) = SingleDecoding(
key = key,
resources = List(IntRegFile -> RS1, IntRegFile -> RD)
}

[Pls dont mind the define of TypeV, just try to add some reg dependency.]

@franktaTian
Copy link
Author

franktaTian commented Jul 22, 2022

I write a simpled test program using SimdAdd test program, it can run correctly and the waveform shows corrrectly.
But when I try to change to this:
val instruction = Frontend.MICRO_OP().asUInt
vpu.cmd.OPCODE := instruction
//vpu.cmd.OPCODE := rs1

The elaborate error appears.

@franktaTian
Copy link
Author

It just a simple experimental. What I want is to get RVV instruction and send it with a "external" module with resolved register value.

@franktaTian
Copy link
Author

Can I use ExecuteInitDemo.scala as template?

@Dolu1990
Copy link
Member

Ahhh what is ExecuteInitDemo ?
The fact that :
// val instruction = Frontend.MICRO_OP().asUInt
// vpu.cmd.OPCODE := instruction

produce an error is because Frontend.MICRO_OP() create a new instance of the data type, it doesn't access it in the pipeline.

Does Frontend.MICRO_OP.asUInt works ? else stage(Frontend.MICRO_OP).asUInt should be good.

Also, be aware that ExecutionUnitElementSimple is kind of only for integer registerfile instruction :
https://github.com/SpinalHDL/NaxRiscv/blob/main/src/main/scala/naxriscv/execute/ExecutionUnitElementSimple.scala#L41

For a raw execution unit element plugin, you can see :
https://github.com/SpinalHDL/NaxRiscv/blob/main/src/main/scala/naxriscv/execute/fpu/FpuFloatExecute.scala
Note that it doesn't implement the wakeup of depedencies. It is done by :
https://github.com/SpinalHDL/NaxRiscv/blob/main/src/main/scala/naxriscv/execute/fpu/FpuWriteback.scala

@franktaTian
Copy link
Author

franktaTian commented Jul 23, 2022

Ahhh what is ExecuteInitDemo ?

Oh, what I mean is "ExecuteUnitDemo.scala".
Can I use it as template?

@Dolu1990
Copy link
Member

Hmm, not realy, i mean, it is kind of a outdated example to create a full execution unit. Is it what you want ?
Else you can compose EU0 via a plugin, using https://github.com/SpinalHDL/NaxRiscv/blob/main/src/main/scala/naxriscv/execute/fpu/FpuFloatExecute.scala as a reference.

@Dolu1990
Copy link
Member

Just to be sure, you didn't added a RegFilePlugin with VecRegFile right ? (should not be added)

@franktaTian
Copy link
Author

Ok,Thanks.I didn'nt added a RegFilePlugin with VecRegFile. The VecRegFile defined in RegFile.scala is just used to resolve the Integer Or Float Register dependencies .
object VecRegFile extends RegfileSpec with AreaObject {
override def sizeArch = 32
override def width = Global.XLEN
override def x0AlwaysZero = true
override def getName() = "vector"
def TypeNone(key : MaskedLiteral) = SingleDecoding(
key = key,
resources = Nil
)
def TypeV(key : MaskedLiteral) = SingleDecoding(
key = key,
resources = List( IntRegFile -> RS1, IntRegFile -> RD ) :+ VPU
)
def TypeVGPRD(key : MaskedLiteral) = SingleDecoding(
key = key,
resources = List( IntRegFile -> RD) :+ VPU
)
def TypeVGPRS2(key : MaskedLiteral) = SingleDecoding(
key = key,
resources = List( IntRegFile -> RS1,IntRegFile -> RS2) :+ VPU
)
def TypeVGPRS1(key : MaskedLiteral) = SingleDecoding(
key = key,
resources = List( IntRegFile -> RS1) :+ VPU
)
def TypeVFPRS1(key : MaskedLiteral) = SingleDecoding(
key = key,
resources = List( FloatRegFile -> RS1 ) :+ VPU
)
def TypeVFPRD(key : MaskedLiteral) = SingleDecoding(
key = key,
resources = List( FloatRegFile -> RD ) :+ VPU
)
}

@Dolu1990
Copy link
Member

Ok ^^

Also, in which context are you doing this project ?

  • Fun / Work / Study ?

@franktaTian
Copy link
Author

franktaTian commented Jul 26, 2022

For Study now .Thanks.

I tried ARA ,which is a VPU for Ariane . But Ariane/CVA6 is an in-order Open source Core . The performance is low.
Thers is no opensource OOO riscv core except NaxRiscv /BOOM. I tried Chisel. It is too complecate for me . But I think the RoCC interface is interesting .
So I try to learn NaxRiscv.The Plugin / Service framwork is interesting.But I not familiar with Scala.

@Dolu1990
Copy link
Member

Dolu1990 commented Aug 2, 2022

But Ariane/CVA6 is an in-order Open source Core . The performance is low.

By curiosity, can you define "performance is low" ?

So I try to learn NaxRiscv.The Plugin / Service framwork is interesting.But I not familiar with Scala.

Also, i have to say, adding the VPU is realy not a simple thing ^^

There is a lot of part in naxriscv working together to get OoO up and running, and is currently put in place to have instructions waking each others through their depedencies / ROB id

I tried ARA ,which is a VPU for Ariane . But Ariane/CVA6 is an in-order Open source Core . The performance is low.

What are your targeting / hopping as performance ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants