Skip to content

Commit

Permalink
feat: Add Speech to Text feature (#25)
Browse files Browse the repository at this point in the history
  • Loading branch information
xmnlab committed Mar 13, 2024
1 parent af3c6d1 commit 122dc7c
Show file tree
Hide file tree
Showing 17 changed files with 1,904 additions and 1,240 deletions.
53 changes: 39 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ order to have everything well installed, create a conda/mamba environment and
install `artbox` there.

```bash
$ mamba create --name artbox "python>=3.8.1,<3.12" pygobject pip
$ mamba create --name artbox "python>=3.8.1,<3.12" "pygobject==3.48.1" pip
$ conda activate artbox
$ pip install artbox
```
Expand All @@ -31,17 +31,17 @@ $ mkdir /tmp/artbox

### Convert text to audio

By default, the `artbox voice` uses
By default, the `artbox speech` uses
[`edge-tts`](https://pypi.org/project/edge-tts/) engine, but if you can also
specify [`gtts`](https://github.com/pndurette/gTTS) with the flag
`--engine gtts`.

```bash
$ echo "Are you ready to join Link and Zelda in fighting off this unprecedented threat to Hyrule?" > /tmp/artbox/text.md
$ artbox voice text-to-speech \
$ artbox speech from-text \
--title artbox \
--text-path /tmp/artbox/text.md \
--output-path /tmp/artbox/voice.mp3 \
--input-path /tmp/artbox/text.md \
--output-path /tmp/artbox/speech.mp3 \
--engine edge-tts
```

Expand All @@ -50,10 +50,10 @@ If you need to generate the audio for different language, you can use the flag

```bash
$ echo "Bom dia, mundo!" > /tmp/artbox/text.md
$ artbox voice text-to-speech \
$ artbox speech from-text \
--title artbox \
--text-path /tmp/artbox/text.md \
--output-path /tmp/artbox/voice.mp3 \
--input-path /tmp/artbox/text.md \
--output-path /tmp/artbox/speech.mp3 \
--lang pt
```

Expand All @@ -62,10 +62,10 @@ locale for that language, for example:

```bash
$ echo "Are you ready to join Link and Zelda in fighting off this unprecedented threat to Hyrule?" > /tmp/artbox/text.md
$ artbox voice text-to-speech \
$ artbox speech from-text \
--title artbox \
--text-path /tmp/artbox/text.md \
--output-path /tmp/artbox/voice.mp3 \
--input-path /tmp/artbox/text.md \
--output-path /tmp/artbox/speech.mp3 \
--engine edge-tts \
--lang en-IN
```
Expand All @@ -75,17 +75,42 @@ and `--pitch`, for example:

```bash
$ echo "Do you want some coffee?" > /tmp/artbox/text.md
$ artbox voice text-to-speech \
$ artbox speech from-text \
--title artbox \
--text-path /tmp/artbox/text.md \
--output-path /tmp/artbox/voice.mp3 \
--input-path /tmp/artbox/text.md \
--output-path /tmp/artbox/speech.mp3 \
--engine edge-tts \
--lang en \
--rate +10% \
--volume -10% \
--pitch -5Hz
```

### Convert audio to text

ArtBox uses `speechrecognition` to convert from audio to text. Currently, ArtBox
just support the `google` engine.

For this example, let's first create our audio:

```bash
$ echo "Are you ready to join Link and Zelda in fighting off this unprecedented threat to Hyrule?" > /tmp/artbox/text.md
$ artbox speech from-text \
--title artbox \
--input-path /tmp/artbox/text.md \
--output-path /tmp/artbox/speech.mp3 \
--engine edge-tts
```

Now we can convert it back to text:

```bash
$ artbox speech to-text \
--input-path /tmp/artbox/speech.mp3 \
--output-path /tmp/artbox/text-from-speech.md \
--lang en
```

### Download a youtube video

If you want to download videos from the youtube, you can use the following
Expand Down
4 changes: 2 additions & 2 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@

### Features

- Add engine options for Voice class. ([#6](https://github.com/ggpedia/artbox/issues/6)) ([d4381f7](https://github.com/ggpedia/artbox/commit/d4381f781a98ffb51fb103d671c5a9115bb3f6d1))
- Add engine options for Speech class. ([#6](https://github.com/ggpedia/artbox/issues/6)) ([d4381f7](https://github.com/ggpedia/artbox/commit/d4381f781a98ffb51fb103d671c5a9115bb3f6d1))

# [0.2.0](https://github.com/ggpedia/artbox/compare/0.1.0...0.2.0) (2023-08-29)

Expand All @@ -69,4 +69,4 @@

### Features

- Add the flag `--lang` for the voice command ([#2](https://github.com/ggpedia/artbox/issues/2)) ([cb937e9](https://github.com/ggpedia/artbox/commit/cb937e9e7a9de5a19b3dc4dc8d34f6daf4ba6304))
- Add the flag `--lang` for the speech command ([#2](https://github.com/ggpedia/artbox/issues/2)) ([cb937e9](https://github.com/ggpedia/artbox/commit/cb937e9e7a9de5a19b3dc4dc8d34f6daf4ba6304))
26 changes: 13 additions & 13 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,17 +31,17 @@ $ mkdir /tmp/artbox

### Convert text to audio

By default, the `artbox voice` uses
By default, the `artbox speech` uses
[`edge-tts`](https://pypi.org/project/edge-tts/) engine, but if you can also
specify [`gtts`](https://github.com/pndurette/gTTS) with the flag
`--engine gtts`.

```bash
$ echo "Are you ready to join Link and Zelda in fighting off this unprecedented threat to Hyrule?" > /tmp/artbox/text.md
$ artbox voice text-to-speech \
$ artbox speech text-to-speech \
--title artbox \
--text-path /tmp/artbox/text.md \
--output-path /tmp/artbox/voice.mp3 \
--input-path /tmp/artbox/text.md \
--output-path /tmp/artbox/speech.mp3 \
--engine edge-tts
```

Expand All @@ -50,10 +50,10 @@ If you need to generate the audio for different language, you can use the flag

```bash
$ echo "Bom dia, mundo!" > /tmp/artbox/text.md
$ artbox voice text-to-speech \
$ artbox speech text-to-speech \
--title artbox \
--text-path /tmp/artbox/text.md \
--output-path /tmp/artbox/voice.mp3 \
--input-path /tmp/artbox/text.md \
--output-path /tmp/artbox/speech.mp3 \
--lang pt
```

Expand All @@ -62,10 +62,10 @@ locale for that language, for example:

```bash
$ echo "Are you ready to join Link and Zelda in fighting off this unprecedented threat to Hyrule?" > /tmp/artbox/text.md
$ artbox voice text-to-speech \
$ artbox speech text-to-speech \
--title artbox \
--text-path /tmp/artbox/text.md \
--output-path /tmp/artbox/voice.mp3 \
--input-path /tmp/artbox/text.md \
--output-path /tmp/artbox/speech.mp3 \
--engine edge-tts \
--lang en-IN
```
Expand All @@ -75,10 +75,10 @@ and `--pitch`, for example:

```bash
$ echo "Do you want some coffee?" > /tmp/artbox/text.md
$ artbox voice text-to-speech \
$ artbox speech text-to-speech \
--title artbox \
--text-path /tmp/artbox/text.md \
--output-path /tmp/artbox/voice.mp3 \
--input-path /tmp/artbox/text.md \
--output-path /tmp/artbox/speech.mp3 \
--engine edge-tts \
--lang en \
--rate +10% \
Expand Down
Loading

0 comments on commit 122dc7c

Please sign in to comment.