Skip to content

Commit

Permalink
ncnn-jni: add RESNET inference using NCNN (#185)
Browse files Browse the repository at this point in the history
  • Loading branch information
zhouwg authored May 21, 2024
1 parent 0a095d0 commit e1c6cf4
Show file tree
Hide file tree
Showing 23 changed files with 2,655 additions and 511 deletions.
30 changes: 17 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,26 +2,25 @@

KanTV("Kan", aka Chinese PinYin "Kan" or Chinese HanZi "看" or English "watch/listen") , an open source project focus on study and practise state-of-the-art AI technology in <b>real scenario</b>(such as online-TV playback and online-TV transcription(real-time subtitle) and online-TV language translation and online-TV video&audio recording works at the same time) on **Android phone/device**, derived from original ![ijkplayer](https://github.com/zhouwg/kantv/tree/kantv-initial) , with much enhancements and new features:

- Watch online TV and local media by customized ![FFmpeg 6.1](https://github.com/zhouwg/FFmpeg), source code of customized FFmpeg 6.1 could be found in <a href="https://github.com/zhouwg/kantv/tree/master/external/ffmpeg-6.1"> external/ffmpeg </a>according to <a href="https://ffmpeg.org/legal.html">FFmpeg's license</a>
- Watch online TV and local media by my customized ![FFmpeg 6.1](https://github.com/zhouwg/FFmpeg), source code of my customized FFmpeg 6.1 could be found in <a href="https://github.com/zhouwg/kantv/tree/master/external/ffmpeg-6.1"> external/ffmpeg </a>according to <a href="https://ffmpeg.org/legal.html">FFmpeg's license</a>

- Record online TV to automatically generate videos (useful for short video creators to generate short video materials but pls respect IPR of original content creator/provider); record online TV's video / audio content for gather video / audio data which might be required of/useful for AI R&D activity

- ASR(Automatic Speech Recognition, a subfiled of AI) study by the great <a href="https://github.com/ggerganov/whisper.cpp"> whisper.cpp </a>

- LLM(Large Language Model, a subfiled of AI) study by the great <a href="https://github.com/ggerganov/llama.cpp"> llama.cpp </a>,Run/experience LLM(such as llama-2-7b, baichuan2-7b, qwen1_5-1_8b, gemma-2b) on Xiaomi 14 using the llama.cpp

- Real-time English subtitle for English online-TV(aka OTT TV) by the great & excellent & amazing<a href="https://github.com/ggerganov/whisper.cpp"> whisper.cpp </a>(<a href="https://github.com/zhouwg/kantv/issues/64">PoC finished on Xiaomi 14</a>. Xiaomi 14 or other powerful Android mobile phone is HIGHLY required/recommended for real-time subtitle feature otherwise unexpected behavior would happen)
- AI subtitle(Real-time English subtitle for English online-TV(aka OTT TV) by the great & excellent & amazing<a href="https://github.com/ggerganov/whisper.cpp"> whisper.cpp </a>)(<a href="https://github.com/zhouwg/kantv/issues/64">PoC finished on Xiaomi 14</a>. Xiaomi 14 or other powerful Android mobile phone is HIGHLY required/recommended for real-time subtitle feature otherwise unexpected behavior would happen)

- 2D graphic performance

- Set up a customized playlist and then use this software to watch the content of the customized playlist for R&D activity


- UI refactor(closer to real commercial Android application and only English is supported in UI language currently)

- Well-maintained "workbench" for ASR(Automatic Speech Recognition) researchers who was interested in practise state-of-the-art AI tech(like [whisper.cpp](https://github.com/ggerganov/whisper.cpp)) in real scenario on mobile device(focus on Android currently)
- Well-maintained "workbench" for ASR(Automatic Speech Recognition) researchers/developers who was interested in practise state-of-the-art AI tech(such as [whisper.cpp](https://github.com/ggerganov/whisper.cpp)) in real scenario on Android phone/device

- Well-maintained "workbench" for LLM(Large Language Model) researchers/developers who was interested in practise state-of-the-art AI tech(such as [llama.cpp](https://github.com/ggerganov/llama.cpp)) in real scenario on Android phone/device, or Run/experience LLM model(such as llama-2-7b, baichuan2-7b, qwen1_5-1_8b, gemma-2b) on Android phone/device using the magic llama.cpp

- Well-maintained "workbench" for <a href="https://github.com/ggerganov/ggml">GGML</a> beginners to study and practise GGML inference framework on Android phone/device

- Well-maintained "workbench" for LLM(Large Language Model) researchers who was interested in practise state-of-the-art AI tech(like [llama.cpp](https://github.com/ggerganov/llama.cpp)) in real scenario on mobile device(focus on Android currently)
- Well-maintained "workbench" for <a href="https://github.com/Tencent/ncnn">NCNN</a> beginners to study and practise NCNN inference framework on Android phone/device

- Android <b>turn-key project</b> for AI researchers(whom mightbe not familiar with <b>regular Android software development</b>)/developers/beginners focus on edge/device-side AI learning / R&D activity, some AI R&D activities (AI algorithm validation / AI model validation / performance benchmark in ASR, LLM, TTS, NLP, CV......field) could be done by Android Studio IDE + a powerful Android phone very easily

Expand Down Expand Up @@ -186,7 +185,7 @@ cd kantv
```
- Modify <a href="https://github.com/zhouwg/kantv/blob/master/core/ggml/CMakeLists.txt#L14">ggml/CMakeLists.txt</a> and <a href="https://github.com/zhouwg/kantv/blob/master/core/ncnn/CMakeLists.txt#L12">ncnn/CMakeLists.txt</a> accordingly if target Android device is Xiaomi 14 or Qualcomm Snapdragon 8 Gen 3 SoC based Android phone
- Modify <a href="https://github.com/zhouwg/kantv/blob/master/core/ggml/CMakeLists.txt#L14">ggml/CMakeLists.txt</a> and <a href="https://github.com/zhouwg/kantv/blob/master/core/ncnn/CMakeLists.txt#L9">ncnn/CMakeLists.txt</a> accordingly if target Android device is Xiaomi 14 or Qualcomm Snapdragon 8 Gen 3 SoC based Android phone
- Modify <a href="https://github.com/zhouwg/kantv/blob/master/core/ggml/CMakeLists.txt#L15">ggml/CMakeLists.txt</a> and <a href="https://github.com/zhouwg/kantv/blob/master/core/ncnn/CMakeLists.txt#L10">ncnn/CMakeLists.txt</a> accordingly if target Android phone is <b>NOT</b> Qualcomm SoC based Android phone
Expand Down Expand Up @@ -234,15 +233,20 @@ https://github.com/zhouwg/kantv/assets/6889919/2fabcb24-c00b-4289-a06e-05b98ecd2
here is a screenshot to demostrate LLM inference by running the magic <a href="https://github.com/ggerganov/llama.cpp"> llama.cpp </a> on a Xiaomi 14 device - <b>fully offline, on-device</b>.
![1782274840](https://github.com/zhouwg/kantv/assets/6889919/8b9228f8-e3f4-4b11-b70f-c4e0c7fadacb)
![196896722](https://github.com/zhouwg/kantv/assets/6889919/d190c039-6254-4713-83ce-557c0fda4c83)
----
here is a screenshot to demostrate ASR inference by running the excellent <a href="https://github.com/ggerganov/whisper.cpp"> whisper.cpp </a> on a Xiaomi 14 device - <b>fully offline, on-device</b>.
![667358015](https://github.com/zhouwg/kantv/assets/6889919/94a175ca-97c0-41ef-b9a1-1dc4b7168516)
![1954672029](https://github.com/zhouwg/kantv/assets/6889919/2a4471d3-a39b-4f6a-8f06-118d3b0dd320)
----
here are some screenshots to demostrate CV inference by running the excellent <a href="https://github.com/Tencent/ncnn"> ncnn </a> on a Xiaomi 14 device - <b>fully offline, on-device</b>.
![2015869763](https://github.com/zhouwg/kantv/assets/6889919/9b4c8325-8f7c-4bea-9cae-ee4627f9d199)
![988568755](https://github.com/zhouwg/kantv/assets/6889919/49c7e22e-0e4c-4ece-b690-04d59ac1382f)
<details>
<summary>some other screenshots</summary>
Expand Down
2 changes: 1 addition & 1 deletion VERSION.txt
Original file line number Diff line number Diff line change
@@ -1 +1 @@
kantv 1.3.7 (based on customized FFmpeg 6.1, whisper.cpp, llama.cpp, bark.cpp, stablediffusion.cpp)
kantv 1.3.7, based on customized FFmpeg 6.1, whisper.cpp(ASR), llama.cpp(LLM), bark.cpp(TTS), stablediffusion.cpp(Text2Image), ncnn, opencv-android
Original file line number Diff line number Diff line change
Expand Up @@ -265,11 +265,23 @@ public class CDEUtils {
public static final int BENCHMARK_MULMAT = 1;
public static final int BENCHMARK_QNN_GGML_OP = 2;
public static final int BENCHMARK_QNN_AUTO_UT = 3;
public static final int BENCHMARK_ASR = 4;

//inference using GGML
public static final int BENCHMARK_ASR = 4; //whisper.cpp
public static final int BENCHMARK_LLM = 5;
public static final int BENCHMARK_TEXT2IMAGE = 6;
public static final int BENCHMARK_CV_MNIST = 7;
public static final int BENCHMARK_TTS = 8;
public static final int BENCHMARK_GGML_MAX = 8;

//inference using NCNN
public static final int BENCHMARK_CV_RESNET = 9;
public static final int BENCHMARK_CV_SQUEEZENET = 10;
public static final int BENCHAMRK_ASR_NCNN = 11;
public static final int BENCHAMRK_TTS_NCNN = 12;

public static final int NCNN_BACKEND_CPU = 0;
public static final int NCNN_BACKEND_GPU = 0;


//keep sync with ggml-qnn.h
Expand Down Expand Up @@ -3921,13 +3933,13 @@ public static String getBenchmarkDesc(int benchmarkIndex) {
return "GGML matrix multiply";

case BENCHMARK_ASR:
return "GGML ASR";
return "ASR inference using GGML";

case BENCHMARK_LLM:
return "GGML LLAMA";
return "LLM inference using GGML";

case BENCHMARK_TEXT2IMAGE:
return "GGML stable diffusion";
return "Text2Image inference using GGML";

case BENCHMARK_QNN_GGML_OP:
return "GGML QNN OP UT"; //UT for PoC-S49: implementation of GGML OPs using QNN API
Expand All @@ -3936,10 +3948,22 @@ public static String getBenchmarkDesc(int benchmarkIndex) {
return "GGML QNN OP UT automation"; //automation UT for PoC-S49: implementation of GGML OPs using QNN API

case BENCHMARK_CV_MNIST:
return "GGML mnist";
return "MNIST inference using GGML";

case BENCHMARK_TTS:
return "GGML TTS";
return "TTS inference using GGML";

case BENCHMARK_CV_RESNET:
return "RESNET inference using NCNN";

case BENCHMARK_CV_SQUEEZENET:
return "SQUEEZENET inference using NCNN";

case BENCHAMRK_ASR_NCNN:
return "ASR inference using NCNN";

case BENCHAMRK_TTS_NCNN:
return "TTS inference using NCNN";
}

return "unknown";
Expand Down Expand Up @@ -3971,6 +3995,17 @@ public static String getBackendDesc(int n_backend_type) {
}
}

public static String getNCNNBackendDesc(int n_backend_type) {
switch (n_backend_type) {
case 0:
return "CPU";
case 1:
return "GPU";
default:
return "unknown";
}
}


public static String getGGMLModeString(int ggmlModeType) {
switch (ggmlModeType) {
Expand Down
13 changes: 12 additions & 1 deletion cdeosplayer/cdeosplayer-lib/src/main/java/org/ncnn/ncnnjava.java
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,26 @@
package org.ncnn;

import android.content.res.AssetManager;
import android.graphics.Bitmap;
import android.view.Surface;

public class ncnnjava
{
public native boolean loadModel(AssetManager mgr, int modelid, int cpugpu);
/**
* @param mgr
* @param netid
* @param modelid
* @param backend_type 0: NCNN_BACKEND_CPU, 1: NCNN_BACKEND_GPU
* @return
*/
public native boolean loadModel(AssetManager mgr, int netid, int modelid, int backend_type);
public native boolean openCamera(int facing);
public native boolean closeCamera();
public native boolean setOutputWindow(Surface surface);

public native String detectResNet(Bitmap bitmap, boolean use_gpu);
public native String detectSqueezeNet(Bitmap bitmap, boolean use_gpu);

static {
System.loadLibrary("ncnn-jni");
}
Expand Down
Binary file added cdeosplayer/kantv/src/main/assets/resnet.bin
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading

0 comments on commit e1c6cf4

Please sign in to comment.