Skip to content

christian sermon (cantonese speech-to-text transcription)

Notifications You must be signed in to change notification settings

michaelchanwahyan/sermon-app

Repository files navigation

sermon-app

A Collection of Christian Sermons From Different Sources

sermon-app is a collection of christian sermons (typically cantonese) in text or audio format. texts from audio sources are generated from raw recording using speech-to-text engine, like azure or whisper (whisper indeed performs a lot better than azure)

Immediately available sermon books

the outcome of this repo compiled from generate LaTeX source are listed below

sermon book series file path in this repo link
宣道傳意 講道講章 Alliance Communications Ministry ./pdf/sermon_ACSMHK.pdf link
漢語聖經協會 講道講章 Chinese Bible International ./pdf/sermon_CBI.pdf link
中國神學研究院 道講章 China Graduate School of Theology ./pdf/sermon_CGST.pdf link
崇基神學院 崇拜講章 Div. Schl of Chung Chi College ./pdf/sermon_DSCCC_2009-present.pdf link
流堂 崇拜講章 Flow Church ./pdf/sermon_FLWC_2021-present.pdf link
宣道會錦繡堂 崇拜講章 Christian Missionary Alliance Fairview Church ./pdf/sermon_FVC_2017-present.pdf link
港九培靈研經會講章 Hong Kong Bible Conference ./pdf/sermon_HKBC_1928-2007.pdf link
./pdf/sermon_HKBC_2008-present.pdf link
JohnsonNg Youtube Channel ./pdf/sermon_JNG_2012-18.pdf link
./pdf/sermon_JNG_2019-20.pdf link
./pdf/sermon_JNG_2021-22.pdf link
./pdf/sermon_JNG_2023-24.pdf link
播道會港福堂 崇拜講章 EFCC Kong Fok Church ./pdf/sermon_KFC_2020-present.pdf link
The Porch, Dallas, TX 75251 ./pdf/sermon_PORCH_2014-present.pdf link
沙田浸信會 Shatin Baptist Church ./pdf/sermon_STBC_2020-present.pdf link
葡萄藤教會 The Vine Church ./pdf/sermon_VINE_2020-present.pdf link
環球聖經公會 講道講章 Worldwide Bible Society ./pdf/sermon_WWBS.pdf link
播道會恩福堂 崇拜講章 Yan Fook Church & Youth ./pdf/sermon_YFCX_2020-2023.pdf link
./pdf/sermon_YFCX_2024-2027.pdf link
中華宣道會友愛堂信培部 Yau Oi School ./pdf/sermon_YOS.pdf link

Statistics Overview on this project

sermon source transcript total count recent development activity
ACSMHK 8.6% ( 948 / 10976) 21.7% ( 564 / 2602)
CBI 0.2% ( 27 / 10976) 1.6% ( 42 / 2602)
CGST 2.0% ( 218 / 10976) 0.9% ( 24 / 2602)
DSCCC 6.5% ( 713 / 10976) 1.1% ( 30 / 2602)
FLWC 1.9% ( 212 / 10976) 3.3% ( 85 / 2602)
FVC 11.2% ( 1231 / 10976) 4.2% ( 110 / 2602)
HKBC 14.2% ( 1559 / 10976) 0.1% ( 3 / 2602)
JNG 27.3% ( 2997 / 10976) 9.3% ( 243 / 2602)
KFC 7.4% ( 816 / 10976) 9.1% ( 238 / 2602)
PORCH 4.5% ( 493 / 10976) 3.3% ( 85 / 2602)
STBC 2.1% ( 234 / 10976) 6.9% ( 180 / 2602)
VINE 2.6% ( 284 / 10976) 23.5% ( 611 / 2602)
WWBS 0.6% ( 68 / 10976) 1.3% ( 34 / 2602)
YFCX 10.4% ( 1139 / 10976) 13.2% ( 344 / 2602)
YOS 0.3% ( 37 / 10976) 0.3% ( 9 / 2602)

Steps to compile the books from scratch (painful !)

Pre-requisites

This work containerizes a lot of python packages into one docker image named "datalab".

You need to play with the following essential elements

  • Python3 (already in datalab)
  • Jupyter (already in datalab)
  • Docker (you shall install it on your host, see Usage-1)
  • LaTeX (you shall install it on your host, see Usage-6)

If you are unfamiliar to these basics, please go to Immediately available sermon books section.

Features

  • automation-ready: new sermons could be found from destinated youtube channel
  • compilation with sorting according preacher, book, etc.
  • opening possibility for more channels source
  • powered by Docker, Jupyter, and Spark

for sermon-app, the author currently dedicates his effort focusing on cantonese sermon compilation so that the valuable resources could be re-archived, re-distributed, re-presented, and served as reference for future opportunities.

the author uses this project to

  • grab from youtube cantonese christian sermons from different accessible channels, audio voice files are retrieved;
  • from audio voice file an Azure speech recognition engine is used for cantonese transcription (from audio speech file to raw text file)
  • generate from transcriped sermon text a pdf compilation with proper sorting by preachers, bible book chapter, sermon title, and time

as it is written in NIV Psalm 127

Unless the Lord builds the house, the builders labor in vain. Unless the Lord watches over the city, the guards stand watch in vain.

This work you see here is truely a blessing from G-d.

Usage

1. Get the [datalab] engine (by host)

1.1 Install Docker

refer to installation guide

1.2 Get datalab container image

docker pull michaelchanwahyan/datalab

2. Start and jupyterlab through [datalab] container (by host)

you probably would need docker-volume

docker run -p 9999:9999 \
           -v /your/path/to/app:/app \
           --name=ds_workspace \
           michaelchanwahyan/datalab:latest \
           /usr/bin/bash /startup.sh

Upon successful execution, opening localhost:9999 from system browser shall bring you to a jupyterlab interface normally looks like

Alt text

(if prompted to password, with reference to the startup script of the datalab platform, possibly the password 'dsteam' is already specified in the startup options "--ServerApp.token='dsteam'". try it whenever it is needed)

3. Generate the sermon book table-of-content (toc) (by container)

a) the index file (a toc-like csv file)

The code files for JohnsonNg Youtube Channel's sermon content are put under /app/projects/JNG/, so that in /app/projects/JNG, run the notebook file generate_index.ipynb using the launched jupyterlab.

Alt text

please be reminded that the scripts may involve human-machine interaction so that you are not running generate_index.ipynb blindly. Do take attention to the inline comment in the source file.

b) download the sermon audio according to index file

inline description in generate_index.ipynb describe the use of yt-dlp to extraction raw audio from youtube.

4. Convert from audio to text (speech-to-text part , by container)

in /app/projects/JNG, the core script is to run the notebook file generate_content.ipynb (or the python counterpart)

azure speech service is required and the azure subscription info is omitted in this repo

a pair of cv_runby_*.py files can be found in the same directory. they serve as cocurrent python script to run the speech2text (by cv_runby_container.py) and text concatenation (by cv_runby_host.py) in an on-the-fly manner

as from 2023 OpenAI/whisper model became available, speech-to-text could become more effective.

also thanks to ggerganov/whisper.cpp who contributes on cpp porting for Apple Silicon integration, whisper runs very fast now.

currently whisper model size used is medium. ggerganov's ggml-medium.bin model file together with other sizes could be found from ggerganov's HaggingFace page.

5. Compile the sermon texts into a single book source (by host/container)

in /app/projects/JNG, run the python script generate_sermonbook.py to generate the LaTeX source file under build/ folder

6. Compile the sermon texts into a single book pdf (by host, where LaTeX is required)

(LaTeX installation: see their page)

in /app/build/, run the build script build.sh with input argument detailed below:

./build.sh JNG # this is to compile JohnsonNg Youtube Channel sermon content
./build.sh HKBC # this is to compile Hong Kong Bible Conference sermon content

the core LaTeX software package required is XeLaTeX.

Editor

contact person : Michael via michaelchan_wahyan@yahoo.com.hk

About

christian sermon (cantonese speech-to-text transcription)

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages