ncnn-jni: add RESNET inference using NCNN (#185)

zhouwg · May 21, 2024 · e1c6cf4 · e1c6cf4
1 parent 0a095d0
commit e1c6cf4
Show file tree

Hide file tree

Showing 23 changed files with 2,655 additions and 511 deletions.
diff --git a/README.md b/README.md
@@ -2,26 +2,25 @@
 
 KanTV("Kan", aka Chinese PinYin "Kan" or Chinese HanZi "看" or English "watch/listen") , an open source project focus on study and practise state-of-the-art AI technology in <b>real scenario</b>(such as online-TV playback and online-TV transcription(real-time subtitle) and online-TV language translation and online-TV video&audio recording works at the same time) on **Android phone/device**, derived from original ![ijkplayer](https://github.com/zhouwg/kantv/tree/kantv-initial) , with much enhancements and new features:
 
-- Watch online TV and local media by customized ![FFmpeg 6.1](https://github.com/zhouwg/FFmpeg), source code of customized FFmpeg 6.1 could be found in <a href="https://github.com/zhouwg/kantv/tree/master/external/ffmpeg-6.1"> external/ffmpeg </a>according to <a href="https://ffmpeg.org/legal.html">FFmpeg's license</a>
+- Watch online TV and local media by my customized ![FFmpeg 6.1](https://github.com/zhouwg/FFmpeg), source code of my customized FFmpeg 6.1 could be found in <a href="https://github.com/zhouwg/kantv/tree/master/external/ffmpeg-6.1"> external/ffmpeg </a>according to <a href="https://ffmpeg.org/legal.html">FFmpeg's license</a>
 
 - Record online TV to automatically generate videos (useful for short video creators to generate short video materials but pls respect IPR of original content creator/provider); record online TV's video / audio content for gather video / audio data which might be required of/useful for AI R&D activity
 
-- ASR(Automatic Speech Recognition, a subfiled of AI) study by the great <a href="https://github.com/ggerganov/whisper.cpp"> whisper.cpp </a>
-
-- LLM(Large Language Model, a subfiled of AI) study by the great <a href="https://github.com/ggerganov/llama.cpp"> llama.cpp </a>，Run/experience LLM(such as llama-2-7b, baichuan2-7b, qwen1_5-1_8b, gemma-2b) on Xiaomi 14 using the llama.cpp 
-
-- Real-time English subtitle for English online-TV(aka OTT TV) by the great & excellent & amazing<a href="https://github.com/ggerganov/whisper.cpp"> whisper.cpp </a>(<a href="https://github.com/zhouwg/kantv/issues/64">PoC finished on Xiaomi 14</a>. Xiaomi 14 or other powerful Android mobile phone is HIGHLY required/recommended for real-time subtitle feature otherwise unexpected behavior would happen)
+- AI subtitle(Real-time English subtitle for English online-TV(aka OTT TV) by the great & excellent & amazing<a href="https://github.com/ggerganov/whisper.cpp"> whisper.cpp </a>)(<a href="https://github.com/zhouwg/kantv/issues/64">PoC finished on Xiaomi 14</a>. Xiaomi 14 or other powerful Android mobile phone is HIGHLY required/recommended for real-time subtitle feature otherwise unexpected behavior would happen)
 
 - 2D graphic performance
 
 - Set up a customized playlist and then use this software to watch the content of the customized playlist for R&D activity
 
-
 - UI refactor(closer to real commercial Android application and only English is supported in UI language currently)
 
-- Well-maintained "workbench" for ASR(Automatic Speech Recognition) researchers who was interested in practise state-of-the-art AI tech(like [whisper.cpp](https://github.com/ggerganov/whisper.cpp)) in real scenario on mobile device(focus on Android currently)
+- Well-maintained "workbench" for ASR(Automatic Speech Recognition) researchers/developers who was interested in practise state-of-the-art AI tech(such as [whisper.cpp](https://github.com/ggerganov/whisper.cpp)) in real scenario on Android phone/device
+
+- Well-maintained "workbench" for LLM(Large Language Model) researchers/developers who was interested in practise state-of-the-art AI tech(such as [llama.cpp](https://github.com/ggerganov/llama.cpp)) in real scenario on Android phone/device, or Run/experience LLM model(such as llama-2-7b, baichuan2-7b, qwen1_5-1_8b, gemma-2b) on Android phone/device using the magic llama.cpp
+
+- Well-maintained "workbench" for <a href="https://github.com/ggerganov/ggml">GGML</a> beginners to study and practise GGML inference framework on Android phone/device
 
-- Well-maintained "workbench" for LLM(Large Language Model) researchers who was interested in practise state-of-the-art AI tech(like [llama.cpp](https://github.com/ggerganov/llama.cpp)) in real scenario on mobile device(focus on Android currently)
+- Well-maintained "workbench" for <a href="https://github.com/Tencent/ncnn">NCNN</a> beginners to study and practise NCNN inference framework on Android phone/device
 
 - Android <b>turn-key project</b> for AI researchers(whom mightbe not familiar with <b>regular Android software development</b>)/developers/beginners focus on edge/device-side AI learning / R&D activity, some AI R&D activities (AI algorithm validation / AI model validation / performance benchmark in ASR, LLM, TTS, NLP, CV......field) could be done by Android Studio IDE + a powerful Android phone very easily
 
@@ -186,7 +185,7 @@ cd kantv
     ```
 
 
- - Modify <a href="https://github.com/zhouwg/kantv/blob/master/core/ggml/CMakeLists.txt#L14">ggml/CMakeLists.txt</a> and <a href="https://github.com/zhouwg/kantv/blob/master/core/ncnn/CMakeLists.txt#L12">ncnn/CMakeLists.txt</a> accordingly if target Android device is Xiaomi 14 or Qualcomm Snapdragon 8 Gen 3 SoC based Android phone
+ - Modify <a href="https://github.com/zhouwg/kantv/blob/master/core/ggml/CMakeLists.txt#L14">ggml/CMakeLists.txt</a> and <a href="https://github.com/zhouwg/kantv/blob/master/core/ncnn/CMakeLists.txt#L9">ncnn/CMakeLists.txt</a> accordingly if target Android device is Xiaomi 14 or Qualcomm Snapdragon 8 Gen 3 SoC based Android phone
 
  - Modify <a href="https://github.com/zhouwg/kantv/blob/master/core/ggml/CMakeLists.txt#L15">ggml/CMakeLists.txt</a> and <a href="https://github.com/zhouwg/kantv/blob/master/core/ncnn/CMakeLists.txt#L10">ncnn/CMakeLists.txt</a> accordingly if target Android phone is <b>NOT</b> Qualcomm SoC based Android phone
 
@@ -234,15 +233,20 @@ https://github.com/zhouwg/kantv/assets/6889919/2fabcb24-c00b-4289-a06e-05b98ecd2
 
 here is a screenshot to demostrate LLM inference by running the magic <a href="https://github.com/ggerganov/llama.cpp"> llama.cpp </a> on a Xiaomi 14 device - <b>fully offline, on-device</b>.
 
-![1782274840](https://github.com/zhouwg/kantv/assets/6889919/8b9228f8-e3f4-4b11-b70f-c4e0c7fadacb)
-
+![196896722](https://github.com/zhouwg/kantv/assets/6889919/d190c039-6254-4713-83ce-557c0fda4c83)
 
 ----
 
 here is a screenshot to demostrate ASR inference by running the excellent <a href="https://github.com/ggerganov/whisper.cpp"> whisper.cpp </a> on a Xiaomi 14 device - <b>fully offline, on-device</b>.
 
-![667358015](https://github.com/zhouwg/kantv/assets/6889919/94a175ca-97c0-41ef-b9a1-1dc4b7168516)
+![1954672029](https://github.com/zhouwg/kantv/assets/6889919/2a4471d3-a39b-4f6a-8f06-118d3b0dd320)
+
+----
+here are some screenshots to demostrate CV inference by running the excellent <a href="https://github.com/Tencent/ncnn"> ncnn </a> on a Xiaomi 14 device - <b>fully offline, on-device</b>.
+
+![2015869763](https://github.com/zhouwg/kantv/assets/6889919/9b4c8325-8f7c-4bea-9cae-ee4627f9d199)
 
+![988568755](https://github.com/zhouwg/kantv/assets/6889919/49c7e22e-0e4c-4ece-b690-04d59ac1382f)
 
 <details>
   <summary>some other screenshots</summary>

diff --git a/VERSION.txt b/VERSION.txt
@@ -1 +1 @@
-kantv 1.3.7 (based on customized FFmpeg 6.1, whisper.cpp, llama.cpp, bark.cpp, stablediffusion.cpp)
+kantv 1.3.7, based on customized FFmpeg 6.1, whisper.cpp(ASR), llama.cpp(LLM), bark.cpp(TTS), stablediffusion.cpp(Text2Image), ncnn, opencv-android
diff --git a/cdeosplayer/cdeosplayer-lib/src/main/java/cdeos/media/player/CDEUtils.java b/cdeosplayer/cdeosplayer-lib/src/main/java/cdeos/media/player/CDEUtils.java
@@ -265,11 +265,23 @@ public class CDEUtils {
      public static final int BENCHMARK_MULMAT           = 1;
      public static final int BENCHMARK_QNN_GGML_OP      = 2;
      public static final int BENCHMARK_QNN_AUTO_UT      = 3;
-     public static final int BENCHMARK_ASR              = 4;
+
+     //inference using GGML
+     public static final int BENCHMARK_ASR              = 4; //whisper.cpp
      public static final int BENCHMARK_LLM              = 5;
      public static final int BENCHMARK_TEXT2IMAGE       = 6;
      public static final int BENCHMARK_CV_MNIST         = 7;
      public static final int BENCHMARK_TTS              = 8;
+     public static final int BENCHMARK_GGML_MAX         = 8;
+
+     //inference using NCNN
+     public static final int BENCHMARK_CV_RESNET        = 9;
+     public static final int BENCHMARK_CV_SQUEEZENET    = 10;
+     public static final int BENCHAMRK_ASR_NCNN         = 11;
+     public static final int BENCHAMRK_TTS_NCNN         = 12;
+
+     public static final int NCNN_BACKEND_CPU           = 0;
+     public static final int NCNN_BACKEND_GPU           = 0;
 
 
      //keep sync with ggml-qnn.h
@@ -3921,13 +3933,13 @@ public static String getBenchmarkDesc(int benchmarkIndex) {
                  return "GGML matrix multiply";
 
              case BENCHMARK_ASR:
-                 return "GGML ASR";
+                 return "ASR inference using GGML";
 
              case BENCHMARK_LLM:
-                 return "GGML LLAMA";
+                 return "LLM inference using GGML";
 
              case BENCHMARK_TEXT2IMAGE:
-                 return "GGML stable diffusion";
+                 return "Text2Image inference using GGML";
 
              case BENCHMARK_QNN_GGML_OP:
                  return "GGML QNN OP UT"; //UT for PoC-S49: implementation of GGML OPs using QNN API
@@ -3936,10 +3948,22 @@ public static String getBenchmarkDesc(int benchmarkIndex) {
                  return "GGML QNN OP UT automation"; //automation UT for PoC-S49: implementation of GGML OPs using QNN API
 
              case BENCHMARK_CV_MNIST:
-                 return "GGML mnist";
+                 return "MNIST inference using GGML";
 
              case BENCHMARK_TTS:
-                 return "GGML TTS";
+                 return "TTS inference using GGML";
+
+             case BENCHMARK_CV_RESNET:
+                 return "RESNET inference using NCNN";
+
+             case BENCHMARK_CV_SQUEEZENET:
+                 return "SQUEEZENET inference using NCNN";
+
+             case BENCHAMRK_ASR_NCNN:
+                 return "ASR inference using NCNN";
+
+             case BENCHAMRK_TTS_NCNN:
+                 return "TTS inference using NCNN";
          }
 
          return "unknown";
@@ -3971,6 +3995,17 @@ public static String getBackendDesc(int n_backend_type) {
          }
      }
 
+     public static String getNCNNBackendDesc(int n_backend_type) {
+         switch (n_backend_type) {
+             case 0:
+                 return "CPU";
+             case 1:
+                 return "GPU";
+             default:
+                 return "unknown";
+         }
+     }
+
 
      public static String getGGMLModeString(int ggmlModeType) {
          switch (ggmlModeType) {

diff --git a/cdeosplayer/cdeosplayer-lib/src/main/java/org/ncnn/ncnnjava.java b/cdeosplayer/cdeosplayer-lib/src/main/java/org/ncnn/ncnnjava.java
@@ -16,15 +16,26 @@
 package org.ncnn;
 
 import android.content.res.AssetManager;
+import android.graphics.Bitmap;
 import android.view.Surface;
 
 public class ncnnjava
 {
-    public native boolean loadModel(AssetManager mgr, int modelid, int cpugpu);
+    /**
+     * @param mgr
+     * @param netid
+     * @param modelid
+     * @param backend_type 0: NCNN_BACKEND_CPU, 1: NCNN_BACKEND_GPU
+     * @return
+     */
+    public native boolean loadModel(AssetManager mgr, int netid, int modelid, int backend_type);
     public native boolean openCamera(int facing);
     public native boolean closeCamera();
     public native boolean setOutputWindow(Surface surface);
 
+    public native String  detectResNet(Bitmap bitmap, boolean use_gpu);
+    public native String  detectSqueezeNet(Bitmap bitmap, boolean use_gpu);
+
     static {
         System.loadLibrary("ncnn-jni");
     }

diff --git a/cdeosplayer/kantv/src/main/assets/resnet.bin b/cdeosplayer/kantv/src/main/assets/resnet.bin
diff --git a/cdeosplayer/kantv/src/main/assets/resnet.param.bin b/cdeosplayer/kantv/src/main/assets/resnet.param.bin
diff --git a/cdeosplayer/kantv/src/main/assets/squeezenet_v1.1.bin b/cdeosplayer/kantv/src/main/assets/squeezenet_v1.1.bin
diff --git a/cdeosplayer/kantv/src/main/assets/squeezenet_v1.1.param.bin b/cdeosplayer/kantv/src/main/assets/squeezenet_v1.1.param.bin