forked from huggingface/transformers
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* add raw scaffold * implement feat extract layers * make style * remove + * correctly convert weights * make feat extractor work * make feature extraction proj work * run forward pass * finish forward pass * Succesful decoding example * remove unused files * more changes * add wav2vec tokenizer * add new structure * fix run forward * add other layer norm architecture * finish 2nd structure * add model tests * finish tests for tok and model * clean-up * make style * finish docstring for model and config * make style * correct docstring * correct tests * change checkpoints to fairseq * fix examples * finish wav2vec2 * make style * apply sylvains suggestions * apply lysandres suggestions * change print to log.info * re-add assert statement * add input_values as required input name * finish wav2vec2 tokenizer * Update tests/test_tokenization_wav2vec2.py Co-authored-by: Lysandre Debut <lysandre@huggingface.co> * apply sylvains suggestions Co-authored-by: Lysandre Debut <lysandre@huggingface.co>
- Loading branch information
1 parent
d996024
commit d6217fb
Showing
20 changed files
with
2,233 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
.. | ||
Copyright 2021 The HuggingFace Team. All rights reserved. | ||
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | ||
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations under the License. | ||
|
||
Wav2Vec2 | ||
----------------------------------------------------------------------------------------------------------------------- | ||
|
||
Overview | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
The Wav2Vec2 model was proposed in `wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations | ||
<https://arxiv.org/abs/2006.11477>`__ by Alexei Baevski, Henry Zhou, Abdelrahman Mohamed, Michael Auli. | ||
|
||
The abstract from the paper is the following: | ||
|
||
*We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on | ||
transcribed speech can outperform the best semi-supervised methods while being conceptually simpler. wav2vec 2.0 masks | ||
the speech input in the latent space and solves a contrastive task defined over a quantization of the latent | ||
representations which are jointly learned. Experiments using all labeled data of Librispeech achieve 1.8/3.3 WER on the | ||
clean/other test sets. When lowering the amount of labeled data to one hour, wav2vec 2.0 outperforms the previous state | ||
of the art on the 100 hour subset while using 100 times less labeled data. Using just ten minutes of labeled data and | ||
pre-training on 53k hours of unlabeled data still achieves 4.8/8.2 WER. This demonstrates the feasibility of speech | ||
recognition with limited amounts of labeled data.* | ||
|
||
Tips: | ||
|
||
- Wav2Vec2 is a speech model that accepts a float array corresponding to the raw waveform of the speech signal. | ||
- Wav2Vec2 model was trained using connectionist temporal classification (CTC) so the model output has to be decoded | ||
using :class:`~transformers.Wav2Vec2Tokenizer`. | ||
|
||
|
||
Wav2Vec2Config | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. autoclass:: transformers.Wav2Vec2Config | ||
:members: | ||
|
||
|
||
Wav2Vec2Tokenizer | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. autoclass:: transformers.Wav2Vec2Tokenizer | ||
:members: __call__, save_vocabulary | ||
|
||
|
||
Wav2Vec2Model | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. autoclass:: transformers.Wav2Vec2Model | ||
:members: forward | ||
|
||
|
||
Wav2Vec2ForMaskedLM | ||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
.. autoclass:: transformers.Wav2Vec2ForMaskedLM | ||
:members: forward |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -63,6 +63,7 @@ | |
t5, | ||
tapas, | ||
transfo_xl, | ||
wav2vec2, | ||
xlm, | ||
xlm_roberta, | ||
xlnet, | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,66 @@ | ||
# flake8: noqa | ||
# There's no way to ignore "F401 '...' imported but unused" warnings in this | ||
# module, but to preserve other warnings. So, don't check this module at all. | ||
|
||
# Copyright 2021 The HuggingFace Team. All rights reserved. | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
from typing import TYPE_CHECKING | ||
|
||
from ...file_utils import _BaseLazyModule, is_tokenizers_available, is_torch_available | ||
|
||
|
||
_import_structure = { | ||
"configuration_wav2vec2": ["WAV_2_VEC_2_PRETRAINED_CONFIG_ARCHIVE_MAP", "Wav2Vec2Config"], | ||
"tokenization_wav2vec2": ["Wav2Vec2Tokenizer"], | ||
} | ||
|
||
if is_torch_available(): | ||
_import_structure["modeling_wav2vec2"] = [ | ||
"WAV_2_VEC_2_PRETRAINED_MODEL_ARCHIVE_LIST", | ||
"Wav2Vec2ForMaskedLM", | ||
"Wav2Vec2Model", | ||
"Wav2Vec2PreTrainedModel", | ||
] | ||
|
||
|
||
if TYPE_CHECKING: | ||
from .configuration_wav2vec2 import WAV_2_VEC_2_PRETRAINED_CONFIG_ARCHIVE_MAP, Wav2Vec2Config | ||
from .tokenization_wav2vec2 import Wav2Vec2Tokenizer | ||
|
||
if is_torch_available(): | ||
from .modeling_wav2vec2 import ( | ||
WAV_2_VEC_2_PRETRAINED_MODEL_ARCHIVE_LIST, | ||
Wav2Vec2ForMaskedLM, | ||
Wav2Vec2Model, | ||
Wav2Vec2PreTrainedModel, | ||
) | ||
|
||
|
||
else: | ||
import importlib | ||
import os | ||
import sys | ||
|
||
class _LazyModule(_BaseLazyModule): | ||
""" | ||
Module class that surfaces all objects but only performs associated imports when the objects are requested. | ||
""" | ||
|
||
__file__ = globals()["__file__"] | ||
__path__ = [os.path.dirname(__file__)] | ||
|
||
def _get_module(self, module_name: str): | ||
return importlib.import_module("." + module_name, self.__name__) | ||
|
||
sys.modules[__name__] = _LazyModule(__name__, _import_structure) |
Oops, something went wrong.