Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Various new features: Enhanced JIT Compiler targeted to multiple devices, ONNX Support, et al. #154

Merged
merged 58 commits into from
Jun 11, 2024

Conversation

hikettei
Copy link
Owner

@hikettei hikettei commented May 25, 2024

Refactors

  • Reduce code to less than 10000 lines.
  • Remove cl-waffe2-simd, with keeping AVX/Neon/SLEEF intrinsic supports.
  • Remove JITCPUTensor instead of adding backends/aten.

AbstractNodes

  • Changes on base-impl
    • Unified and simplified arithmetic operations using !view.
      • cl-waffe2 uses broadcast+arithmetic ops instead of removing ScalarAdd/ScalarSub/ScalarMul/ScalarDiv/InverseTensorNode.
    • Rename: !inv -> !reciprocal
    • Fix typo: !leaky-relu
  • keepdims option for !max/!min etc

Command Line Tool

  • Uses Fiveam -> Rove as a testing tool.
  • Make all tests hardware-independent
  • Added a command line tool at ./roswell/waffe2.ros
  • From ONNX to C Interpreter mode (waffe2 codegen xxx)
    • It generates the minimum C interpreter (calling CUDA/Metal functions), minimizing the dependencies.

ShapeTracker

  • [Ehhancement] [ ] to express ScalarTensor, instead of using [scal] and out-scalar-p=t.

Frontend

  • Implement from onnx to cl-waffe2 IR mode.

Backend

  • Ready for implementing AbstractTensor backend

    • Fast Conv2d kernel (reduces memory-latency, winograd)
    • Ready for implementing GPU Supports, including Triton, Metal, CUDA, etc.
  • Remove cl-waffe2-simd

  • Remove obsolete SIMD dependencies

  • Merge the aten-runtime branch.

APIs

  • Implement AbstractListTensor

    • It basically behaves (Shape) Tensor.
    • but when called with forward, it wraps λ function w/ loop.
    • When called with !concatenate, it creates (LazyXXX ~ Shape) where LazyXXX = (length x) tensor
    • It enables implementing kv-cache for transformer models.
  • Control-FLow

    • IfNode IfNode ~ EndIfNode
    • LoopNode LoopNode ~ EndLoopNode
    • POV: Is wf2IR turing-complete?

hikettei and others added 24 commits May 24, 2024 13:36
…ithmetic operations across Scalar and matrix
[Refactor] Added Command Line Tool, Hardware-independent unittest, unified arithmetic operations, updated workflow, et al.
@hikettei hikettei changed the title [WIP] Refactor [WIP] Refactoring May 25, 2024
@hikettei hikettei changed the title [WIP] Refactoring [WIP] Various new features: Enhanced JIT Compiler targeted to multiple devices, and ONNX Support, et al. Jun 4, 2024
@hikettei hikettei changed the title [WIP] Various new features: Enhanced JIT Compiler targeted to multiple devices, and ONNX Support, et al. [WIP] Various new features: Enhanced JIT Compiler targeted to multiple devices, ONNX Support, et al. Jun 4, 2024
@hikettei
Copy link
Owner Author

I will merge the changes once as it is unlikely to get much development time for a while T_T. (plus, the IR specifications I have formulated are so bad that it is difficult to add new features. I need to take enough time and redevelop the entire back-end from scratch. but is it worth it?...)

@hikettei hikettei merged commit b9ec982 into master Jun 11, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant