Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gTTS 2.0.0 #108

Merged
merged 96 commits into from
Apr 30, 2018
Merged

gTTS 2.0.0 #108

merged 96 commits into from
Apr 30, 2018

Conversation

pndurette
Copy link
Owner

@pndurette pndurette commented Apr 29, 2018

Hello!
I am super happy to announce the release of gTTS 2.0.0! This is pretty much a rewrite with ton of new cool stuff and fixes which I've been working on for the past few months. See below!
Also see new docs at http://gtts.readthedocs.io/

Upgrading to 2.0.0

Most users will have nothing to change!
See Deprecations and Removals below.

Features

  • The gtts module
    • New logger ("gtts") replaces all occurrences of print()
    • Languages list is now obtained automatically (gtts.lang) (Supporting Languages #91, Filipino Language Support #94, when use "zh-yue" occured HTTPError: 404 Client Error #106)
    • Added a curated list of language sub-tags that have been observed to provide different dialects or accents (e.g. "en-gb", "fr-ca")
    • New gTTS() parameter lang_check to disable language checking.
    • gTTS() now delegates the text tokenizing to the API request methods (i.e. write_to_fp(), save()), allowing gTTS instances to be modified/reused
    • Rewrote tokenizing and added pre-processing (see below)
    • New gTTS() parameters pre_processor_funcs and tokenizer_func to configure pre-processing and tokenizing (or use a 3rd party tokenizer)
    • Error handling:
      • Added new exception gTTSError raised on API request errors. It attempts to guess what went wrong based on known information and observed behaviour ('NoneType' object has no attribute 'group' #60, when use "zh-yue" occured HTTPError: 404 Client Error #106)
      • gTTS.write_to_fp() and gTTS.save() also raise gTTSError on gtts_token error
      • gTTS.write_to_fp() raises TypeError when fp is not a file-like object or one that doesn't take bytes
      • gTTS() raises ValueError on unsupported languages (and lang_check is True)
      • More fine-grained error handling throughout (e.g.request failed vs. request successful with a bad response)
  • Tokenizer (and new pre-processors):
    • Rewrote and greatly expanded tokenizer (gtts.tokenizer)
    • Smarter token 'cleaning' that will remove tokens that only contain characters that can't be spoken (i.e. punctuation and whitespace)
    • Decoupled token minimizing from tokenizing, making the latter usable in other contexts
    • New flexible speech-centric text pre-processing
    • New flexible full-featured regex-based tokenizer (gtts.tokenizer.core.Tokenizer)
    • New RegexBuilder, PreProcessorRegex and PreProcessorSub classes to make writing regex-powered text pre-processors and tokenizer cases easier
    • Pre-processors:
      • Re-form words cut by end-of-line hyphens
      • Remove periods after a (customizable) list of known abbreviations (e.g. "jr", "sr", "dr") that can be spoken the same without a period
      • Perform speech corrections by doing word-for-word replacements from a (customizable) list of tuples
    • Tokenizing:
      • Keep punctuation that modify the inflection of speech (e.g. "?", "!")
      • Don't split in the middle of numbers (e.g. "10.5", "20,000,000") (Punctuation is wrong with longer text  #101)
      • Don't split on "dotted" abbreviations and accronyms (e.g. "U.S.A")
      • Added Chinese comma (","), ellipsis ("…") to punctuation list to tokenize on (Missing punctuation #86)
  • The gtts-cli command-line tool
    • Rewrote cli as first-class citizen module (gtts.cli), powered by Click
    • Windows support using setuptool's entry_points
    • Better support for Unicode I/O in Python 2
    • All arguments are now pre-validated
    • New --nocheck flag to skip language pre-checking
    • New --all flag to list all available languages
    • Either the --file option or the <text> argument can be set to "-" to read from stdin
    • The --debug flag uses logging and doesn't pollute stdout anymore

Bugfixes

Deprecations and Removals

  • Dropped Python 3.3 support
  • Removed debug parameter of gTTS (in favour of logger)
  • gtts-cli: Changed long option name of -o to --output instead of --destination
  • gTTS() will raise a ValueError rather than an AssertionError on unsupported language

Improved Documentation

Misc

  • Major test re-work
  • Language tests can read a TEST_LANGS enviromment variable so not all language tests are run every time.
  • Added AppVeyor CI for Windows
  • PEP 8 compliance

pndurette and others added 30 commits February 16, 2018 22:13
…on existing languages, Unicode manipulations, input with no spaces
@pndurette pndurette modified the milestone: 2.0.0 Apr 30, 2018
@pndurette pndurette merged commit a81c404 into master Apr 30, 2018
@pndurette pndurette deleted the feature/cli-py-2-3 branch April 30, 2018 04:20
@pndurette pndurette restored the feature/cli-py-2-3 branch April 30, 2018 04:20
@pndurette pndurette deleted the feature/cli-py-2-3 branch April 30, 2018 04:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants