Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

my prediction is only a´s and e´s #1

Closed
Drazcat opened this issue Mar 4, 2020 · 1 comment
Closed

my prediction is only a´s and e´s #1

Drazcat opened this issue Mar 4, 2020 · 1 comment

Comments

@Drazcat
Copy link

Drazcat commented Mar 4, 2020

i have trained de specs2text with the small_model, and when i test it, i only get "a" and "e" as output. The input i put to test it, is the spectogram goten by the class WavAudio. what am i doing wrong?

------------------------test code--------------------------------
from keras import backend as K
from data_gen import WavAudio
from model import small_model
import numpy as np

labels = [" ", "a", "b", "c", "d", "e", "f", "g", "h", "i", "j",
"k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u",
"v", "w", "x", "y", "z", "'"]

audio_path = "datagen_utils/datasets/LibriSpeech/train-clean-100-wav/4014/186179/4014-186179-0024.wav"

wav = WavAudio(audio_path)
wavr = wav.specgram

decode_model = small_model((None, wavr.shape[0], 256),
len(labels) + 1, 1000, train=False)

decode_model.load_weights('small_model5x3.h5')

wavr1 = np.expand_dims(wavr, axis=0)

pred = decode_model.predict(wavr1)

def labels_to_text(labs):
ret = []
for c in labs:
if c == len(labels): # CTC Blank
ret.append("_")
else:
ret.append(labels[c])
return "".join(ret)

def decode_predict_ctc(out, top_paths=1):
results1 = []
beam_width = 100
if beam_width < top_paths:
beam_width = top_paths
for i in range(top_paths):
labs = K.get_value(K.ctc_decode(out, input_length=np.ones(out.shape[0]) * out.shape[1],
greedy=False, beam_width=beam_width, top_paths=top_paths)[0][i])[0]
text = labels_to_text(labs)
results1.append(text)

return results1

results = decode_predict_ctc(pred)
print("RESULTADO DE LA PREDICCION----------------------------------------------------------")
print("Transcript original:",
"was a constantly moving line of motor trucks coming forward with men and shells while out ahead of them tremendous and menacing big tanks")
print("Prediccion: ", results)

--------------------------what i get---------------------------------------
RESULTADO DE LA PREDICCION----------------------------------------------------------
Transcript original: for he began to suspect who she was she however without noticing the excitement of cardenio continuing her story went on to say
Prediccion: ['a e e a a e e e e e e e a a e a e e ea e a e e']

@nick-monto
Copy link
Owner

It may be that you aren't doing anything wrong.

I have also gotten this result when training on a super small, roughly 500 item, sample of the training data. As I don't currently have the compute resources to train the full model on the complete training set I have been unable to accurately test and debug it.

Until I get the network fully trained and am able to debug, I will close this comment. Please reopen if you come across anything else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants