Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gTTS 2.0.0 #108

Merged
merged 96 commits into from
Apr 30, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
96 commits
Select commit Hold shift + click to select a range
373d639
Use Web scraping to get list of tts languages
pndurette Feb 17, 2018
c135e76
Clean up gtts_cli, add language check, single letter arguments
pndurette Feb 17, 2018
6797a0e
Rewrote gtts-cli as module, using click, setuptools
pndurette Feb 24, 2018
d5f9120
Added method to get lang. codes list. Style
pndurette Feb 24, 2018
69854ab
Added Chinese comma, ellipsis; gTTSError; cleanup
pndurette Feb 24, 2018
67565f3
Improve help strings, add page encoding, readability
pndurette Feb 24, 2018
6df9827
Linting and cleaning
pndurette Feb 24, 2018
a44ae93
Tests into a submodule
pndurette Feb 27, 2018
1586068
Added tokenization punctuation; fix cases where text parts start or c…
pndurette Mar 1, 2018
59100ed
Added extra English language tags
pndurette Mar 1, 2018
8f6eee2
Split tts tests in Web requests and utils, added tests for cases of n…
pndurette Mar 1, 2018
461154b
Fixing lang tests, linting
pndurette Mar 1, 2018
7728c22
License year++
pndurette Mar 1, 2018
7aac2ab
Test fix
pndurette Mar 1, 2018
ee59cd6
Changed print()s for a logger, better exceptions, cleanup
pndurette Mar 8, 2018
43a9ae5
Merge branch 'master' into cli-py-2-3
scivision Mar 8, 2018
8eec3d3
add pytest, AppVeyor (Windows self test)
scivision Mar 8, 2018
6ada058
test loc
scivision Mar 8, 2018
2910c8c
remove race conditions, remote EOL Python 3.3, dedupe code
scivision Mar 8, 2018
434ed6e
improve test efficiency
scivision Mar 8, 2018
dc8efdb
Better exception handling for request/write
pndurette Mar 8, 2018
ea07da0
Added logging, exception cleanup
pndurette Mar 9, 2018
416c4f2
Better test cleanup
pndurette Mar 9, 2018
d96ac39
Warn if no text to speak after input processing
pndurette Mar 9, 2018
76903bb
Test fix (attempting to close a non-file handle)
pndurette Mar 9, 2018
27acfa9
Fixed gTTSError logic, log.warn() deprecated
pndurette Mar 9, 2018
d2abc65
TEST_LANGS env. var. controls which lang. tests to run
pndurette Mar 9, 2018
b2abaae
Merge pull request #99 from scivision/cli-py-2-3
pndurette Mar 9, 2018
4cc4a01
Merge branch 'scivision-cli-py-2-3' into feature/cli-py-2-3
pndurette Mar 9, 2018
cea037f
Using with tempfile.SpooledTemporaryFile in test
pndurette Mar 9, 2018
557c796
Version bump 2.0.0 🎉
pndurette Mar 11, 2018
9cd1394
Added test descriptions
pndurette Mar 11, 2018
6ceb2fd
Added logger support, lang. valiation as callback, help/coments/struc…
pndurette Mar 11, 2018
561227b
Include README.rst
pndurette Mar 12, 2018
b0504e5
Exception skeleton for i/o. Help str. fix
pndurette Mar 12, 2018
4a2e642
CLI tests
pndurette Mar 12, 2018
c8aa089
Use 'AssertRegex' from 'six' for Py < 3.1 compat
pndurette Mar 12, 2018
008fc04
Replace newlines with ' ' instead of '' which would glue words togeth…
pndurette Mar 18, 2018
6ab2226
CLI input tests.
pndurette Mar 18, 2018
a33a19d
py27 unicode file opening fix, better exception handling for i/o, res…
pndurette Mar 18, 2018
b8502eb
CLI output tests
pndurette Mar 18, 2018
3f77e92
Cleanup _len() method
pndurette Mar 18, 2018
0d9356f
Don't split on number decimals
pndurette Mar 18, 2018
87a8602
Text nothing to send to API assert
pndurette Mar 18, 2018
9ebf5da
Comment and test cleanup
pndurette Mar 18, 2018
7cc577c
Building .appveyor.yml
pndurette Mar 19, 2018
a68cb2e
Building .appveyor.yml (cont.)
pndurette Mar 19, 2018
c9cc96d
Building .appveyor.yml (cont.)
pndurette Mar 19, 2018
6cde1e5
Building .appveyor.yml (cont.)
pndurette Mar 19, 2018
a076640
Building .appveyor.yml (cont.)
pndurette Mar 19, 2018
f7e9c6b
Building .appveyor.yml (cont.)
pndurette Mar 19, 2018
5d38a35
Force utf-8 for --file, test cleanup/making Win happy, cleanup .appvoyer
pndurette Mar 20, 2018
05ce3bf
Tests tts: write_to_fp(), save(), gTTSError logic
pndurette Mar 21, 2018
8cdfd11
Don't catch exceptions we can't handle
pndurette Mar 21, 2018
49ea667
Omit tests/ and logger from coverage, cleanup
pndurette Mar 21, 2018
f8fc00a
AppVoyer/Windows doens't like files being deleted during tests
pndurette Mar 21, 2018
93f04c6
try/except for token, better exception user experience
pndurette Mar 22, 2018
23e26bd
Spun off string methods and rewrote tokenizer:
pndurette Mar 26, 2018
2024765
Added test for False max_size in strings.tokenize(), coverage
pndurette Mar 26, 2018
14b0dd3
Add coveralls
pndurette Mar 26, 2018
9f02180
Coverage run in .travis.yml
pndurette Mar 26, 2018
9b8796f
Added Coveralls to .travis.yml
pndurette Mar 26, 2018
7ecfc89
Add Sphinx docs
pndurette Mar 27, 2018
beb3690
_tokenize et al. refactor, tone marks test, docs
pndurette Mar 27, 2018
e36bc12
Polished comments
pndurette Mar 28, 2018
0703a95
Docs added for cli, logging, module (wip)
pndurette Mar 28, 2018
3739c0a
Pre-processor to re-form end-of-line hyphenated words, comments
pndurette Mar 30, 2018
72ecd35
Documentation
pndurette Mar 30, 2018
b7350ed
Refactor tokenizer:
pndurette Apr 10, 2018
a55f4b9
Took logger out of gTTS class, try/except for pre-processors and toke…
pndurette Apr 10, 2018
4ea5728
Defined __all__
pndurette Apr 10, 2018
050377d
Badge update, TODOs
pndurette Apr 10, 2018
05d2d97
Added legacy tokenizer case (like gTTS 1.x)
pndurette Apr 11, 2018
68d44be
Added exception raising in tokenizer.core, removed from tokenizing me…
pndurette Apr 11, 2018
46edba8
Refactor gtts.lang from class to module.methods; log tweaks
pndurette Apr 12, 2018
7e9409d
Missing 'tone_marks' tokenizer case
pndurette Apr 14, 2018
cb48f03
Documented gtts.utils; tests, using _len(delim) in case delim is unic…
pndurette Apr 14, 2018
dd05601
gtts.tokenizer tests skeleton
pndurette Apr 14, 2018
438db0e
Unicode string in gtts.tests.test_utils for Python 2.x..
pndurette Apr 14, 2018
aed7e02
Added docstrings and __repr__() to RegexBuilder, PreProcessorRegex, P…
pndurette Apr 15, 2018
8954a66
Doc. draft for Tokenizer, doc. simplification of other gtts.tokenizer…
pndurette Apr 15, 2018
04e291f
Doc. drafting
pndurette Apr 16, 2018
2ff0c0c
Added 'fr-fr' extra lang, doc. update
pndurette Apr 16, 2018
db81b36
Docs work
pndurette Apr 16, 2018
b4fe4e6
gTTS.write_to_fp() except change to handle non-byes objects
pndurette Apr 17, 2018
cca8d74
New Readme draft
pndurette Apr 17, 2018
7339dda
Use proper metavar in validation error msg
pndurette Apr 29, 2018
7434f7b
Updated Docs & Co.
pndurette Apr 29, 2018
d0a716d
Tests for gtts.tokenizer module
pndurette Apr 29, 2018
75e2f37
Clean tokens when text < 100 chars
pndurette Apr 29, 2018
e8df423
New Changelog & Changes for 2.0.0
pndurette Apr 30, 2018
61dc2fa
ReadTheDocs conf.
pndurette Apr 30, 2018
6d87da9
Release on Python 3.6
pndurette Apr 30, 2018
ddfc4cd
Changelog Typos & Release PR
pndurette Apr 30, 2018
afcb8bb
Simplified tokenizer testing
pndurette Apr 30, 2018
dacbe04
Test Fixes
pndurette Apr 30, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Linting and cleaning
  • Loading branch information
pndurette committed Feb 24, 2018
commit 6df9827d3e0190523b022aa523f0544886c8c172
70 changes: 50 additions & 20 deletions gtts/cli.py
Original file line number Diff line number Diff line change
@@ -1,77 +1,107 @@
# -*- coding: utf-8 -*-
from gtts import gTTS, Languages, __version__
import sys, os, codecs, click
import click

# Click settings
CONTEXT_SETTINGS = {
'help_option_names': ['-h', '--help']
}


def validate_text(ctx, param, value):
"""Validation callback for the 'text' argument.
Ensures 'text' (arg) and 'file' (opt) are mutually exclusive
"""Validation callback for the <text> argument.
Ensures <text> (arg) and <file> (opt) are mutually exclusive
"""
if not value and 'file' not in ctx.params:
# No <text> and no <file>
raise click.BadParameter("TEXT or -f/--file FILENAME required")
raise click.BadParameter(
"TEXT or -f/--file FILENAME required")
if value and 'file' in ctx.params:
# Both <text> and <file>
raise click.BadParameter("TEXT and -f/--file FILENAME can't be used together")
raise click.BadParameter(
"TEXT and -f/--file FILENAME can't be used together")
return value


def print_languages(ctx, param, value):
"""Prints sorted pretty-printed list of supported languages"""
if not value or ctx.resilient_parsing:
return
langs = Languages().get()
click.echo(' '+'\n '.join(sorted("{}: {}".format(k, langs[k]) for k in langs)))
langs_str_list = sorted("{}: {}".format(k, langs[k]) for k in langs)
click.echo(' ' + '\n '.join(langs_str_list))
ctx.exit()


@click.command(context_settings=CONTEXT_SETTINGS)
@click.argument('text', required=False, callback=validate_text)
@click.option('-f', '--file',
@click.option(
'-f',
'--file',
type=click.File(),
help="Input is contents of FILENAME instead of TEXT (use '-' for stdin).")
@click.option('-o', '--output',
@click.option(
'-o',
'--output',
type=click.File(mode='wb'),
help="Write to FILENAME instead of stdout.")
@click.option('-s', '--slow', default=False, is_flag=True,
@click.option(
'-s',
'--slow',
default=False,
is_flag=True,
help="Read more slowly.")
@click.option('-l', '--lang', default='en', show_default=True,
@click.option(
'-l',
'--lang',
default='en',
show_default=True,
help="IETF language tag. Language to speak in. List documented tags with -a/--all.")
@click.option('--nocheck', default=False, is_flag=True,
@click.option(
'--nocheck',
default=False,
is_flag=True,
help="Disable strict IETF language tag checking. Allow undocumented tags.")
@click.option('--all', default=False, is_flag=True, callback=print_languages,
expose_value=False, is_eager=True,
@click.option(
'--all',
default=False,
is_flag=True,
callback=print_languages,
expose_value=False,
is_eager=True,
help="Print all documented available IETF language tags and exit.")
@click.option('--debug', default=False, is_flag=True, help="Show debug information.")
@click.option(
'--debug',
default=False,
is_flag=True,
help="Show debug information.")
@click.version_option(version=__version__)
def tts_cli(text, file, output, slow, lang, nocheck, debug):
"""Reads TEXT to MP3 format using Google Translate's Text-to-Speech API.
(use '-' as TEXT or as -f/--file FILENAME for stdin)
"""

# Language check
# (We can't do callback validation on <lang> because we
# have to check against <nocheck> which might not exist
# in the Click context at the time <lang> is used)
check = not nocheck # Readability
check = not nocheck # Readability
if check:
langs_list = Languages().get_list()
if lang not in langs_list:
if lang not in Languages().get():
msg = "Use --all to list languages, or add --nocheck to disable language check."
raise click.BadParameter(msg, param_hint="lang '{}'".format(lang))

# stdin for <text> (auto for <file>)
if text is '-':
text = click.get_text_stream('stdin').read()

# stdout (when no <output>)
if not output:
output = click.get_binary_stream('stdout')

# <file> input
if file: text = file.read()
if file:
text = file.read()

# TTS
tts = gTTS(text=text, lang=lang, slow=slow, lang_check=check, debug=debug)
Expand Down
26 changes: 16 additions & 10 deletions gtts/lang.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,24 @@
# -*- coding: utf-8 -*-
import requests, re
import requests
import re
from bs4 import BeautifulSoup

"""Google Translate loads a JavaScript Array of 'languages
codes' that can be read. We intersect with all the
languages Google Translate provides.
"""


class LanguagesFetchError(Exception):
pass


class Languages:
"""Supported languages by Google's Text to Speech API"""

URL_BASE = 'http://translate.google.com'
JS_FILE = 'desktop_module_main.js'

"""Special undocumented language codes observed
to provide different dialects or accents
"""
Expand All @@ -28,17 +31,17 @@ class Languages:
'en-us': 'English (US)',
'en-ca': 'English (Canada)',
'en-uk': 'English (UK)',
'en-gb': 'English (UK)',
'en-gb': 'English (UK)',
'en-au': 'English (Australia)',
# French
'fr-ca': 'French (Canada)',
# Portuguese
'pt-br': 'Portuguese (Brazil)',
'pt-pt': 'Portuguese (Portugal)',
# Spanish
'es-es' : 'Spanish (Spain)',
'es-us' : 'Spanish (United States)'
}
'es-es': 'Spanish (Spain)',
'es-us': 'Spanish (United States)'
}

def __init__(self):
self.langs = dict()
Expand All @@ -50,7 +53,7 @@ def get(self):

def get_list(self):
langs_dict = self.get()
langs_list = list(langs_dict.keys())
langs_list = list(langs_dict.keys())
return langs_list

def _fetch_langs(self):
Expand Down Expand Up @@ -84,10 +87,13 @@ def _fetch_langs(self):
Out: {'af': 'Afrikaans', [...]}
"""
langs_html = soup.find('select', {'id': 'gt-sl'}).findAll('option')
langs = {l['value']:l.text for l in langs_html if l['value'] in tts_langs}
langs = {
l['value']: l.text for l in langs_html if l['value'] in tts_langs}
return langs
except Exception as e:
raise LanguagesFetchError("Unable to get language list: {}".format(str(e)))
raise LanguagesFetchError(
"Unable to get language list: {}".format(str(e)))


if __name__ == "__main__":
pass
72 changes: 45 additions & 27 deletions gtts/tts.py
Original file line number Diff line number Diff line change
@@ -1,28 +1,39 @@
# -*- coding: utf-8 -*-
from . import Languages, LanguagesFetchError
from . import Languages, LanguagesFetchError
from six.moves import urllib
from requests.packages.urllib3.exceptions import InsecureRequestWarning
from gtts_token.gtts_token import Token
import re, requests, warnings
import re
import requests
import warnings


class gTTSError(Exception):
pass


class Speed:
"""Google TTS API read speeds"""
# The API supports two speeds.

# The API supports two speeds.
# (speed <= 0.3: slow; speed > 0.3: normal; default: 1)
SLOW = 0.3
NORMAL = 1


class gTTS:
"""gTTS (Google Text to Speech): an interface to Google's Text to Speech API"""

GOOGLE_TTS_URL = "https://translate.google.com/translate_tts"
MAX_CHARS = 100 # Max characters the Google TTS API takes at a time

def __init__(self, text, lang = 'en', slow = False, lang_check = False, debug = False):
MAX_CHARS = 100 # Max characters the Google TTS API takes at a time

def __init__(
self,
text,
lang='en',
slow=False,
lang_check=False,
debug=False):
self.debug = debug

# Language
Expand Down Expand Up @@ -53,14 +64,14 @@ def __init__(self, text, lang = 'en', slow = False, lang_check = False, debug =
if self._len(text) <= self.MAX_CHARS:
text_parts = [text]
else:
text_parts = self._tokenize(text, self.MAX_CHARS)
text_parts = self._tokenize(text, self.MAX_CHARS)

# Clean
def strip(x): return x.replace('\n', '').strip()
text_parts = [strip(x) for x in text_parts]
text_parts = [x for x in text_parts if len(x) > 0]
self.text_parts = text_parts

# Google Translate token
self.token = Token()

Expand All @@ -72,25 +83,28 @@ def save(self, savefile):
def write_to_fp(self, fp):
"""Do the Web request and save to a file-like object"""
for idx, part in enumerate(self.text_parts):
payload = { 'ie' : 'UTF-8',
'q' : part,
'tl' : self.lang,
'ttsspeed' : self.speed,
'total' : len(self.text_parts),
'idx' : idx,
'client' : 'tw-ob',
'textlen' : self._len(part),
'tk' : self.token.calculate_token(part)}
payload = {'ie': 'UTF-8',
'q': part,
'tl': self.lang,
'ttsspeed': self.speed,
'total': len(self.text_parts),
'idx': idx,
'client': 'tw-ob',
'textlen': self._len(part),
'tk': self.token.calculate_token(part)}
headers = {
"Referer" : "http://translate.google.com/",
"Referer": "http://translate.google.com/",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
}
if self.debug: print(payload)
if self.debug:
print(payload)
try:
# Disable requests' ssl verify to accomodate certain proxies and firewalls
# Filter out urllib3's insecure warnings. We can live without ssl verify here
# Filter out urllib3's insecure warnings. We can live without
# ssl verify here
with warnings.catch_warnings():
warnings.filterwarnings("ignore", category=InsecureRequestWarning)
warnings.filterwarnings(
"ignore", category=InsecureRequestWarning)
r = requests.get(self.GOOGLE_TTS_URL,
params=payload,
headers=headers,
Expand All @@ -99,7 +113,9 @@ def write_to_fp(self, fp):
if self.debug:
print("Headers: {}".format(r.request.headers))
print("Request url: {}".format(r.request.url))
print("Response: {}, Redirects: {}".format(r.status_code, r.history))
print(
"Response: {}, Redirects: {}".format(
r.status_code, r.history))
r.raise_for_status()
for chunk in r.iter_content(chunk_size=1024):
fp.write(chunk)
Expand All @@ -121,7 +137,7 @@ def _len(self, text):

def _tokenize(self, text, max_size):
"""Tokenizer on basic punctuation"""

punc = "¡!()[]¿?.,…‥،;:—。,、:?!\n"
punc_list = [re.escape(c) for c in punc]
pattern = '|'.join(punc_list)
Expand All @@ -134,13 +150,15 @@ def _tokenize(self, text, max_size):

def _minimize(self, thestring, delim, max_size):
"""Recursive function that splits <thestring> in chunks
of maximum <max_size> chars delimited by <delim>. Returns list."""
of maximum <max_size> chars delimited by <delim>. Returns list."""

if self._len(thestring) > max_size:
idx = thestring.rfind(delim, 0, max_size)
return [thestring[:idx]] + self._minimize(thestring[idx:], delim, max_size)
return [thestring[:idx]] + \
self._minimize(thestring[idx:], delim, max_size)
else:
return [thestring]


if __name__ == "__main__":
pass