You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.
I would like to perform a sanity check by passing some input to the model and reading the output text.
Following the PyTorch tutorial on language modelling (https://github.com/pytorch/examples/blob/master/word_language_model/generate.py), I have edited the
evaluate
function:, where
created_inverse_tokenizer_during_training
isidx2word
fromDictionary
classI am testing on ptb dataset and I get the following with approximately 60 perplexity value:
inputs:
[made, value, $, their, intends, N, also, south, , or]
[much, criteria, N, office, to, return, closed, as, one, $]
[difference, devised, billion, visits, restrict, on, sharply, it, analyst, N]
[in, by, , as, the, assets, lower, became, peter, a]
[liquidity, benjamin, a, , rtc, for, across, more, , share]
[in, graham, , breaks, to, security, europe, clear, of, in]
[the, an, , , treasury, pacific, particularly, that, , the]
[pit, analyst, by, but, borrowings, and, in, a, &, fiscal]
[, and, an, massage, only, an, frankfurt, repeat, co., year]
[it, author, , no, unless, N, although, of, new, just]
["s", in, not, matter, the, N, london, the, york, ended]
[too, the, , how, agency, return, and, october, said, up]
[soon, 1930s, though, , receives, on, a, N, the, from]
[to, and, , is, specific, equity, few, crash, gold, $]
[tell, , , still, congressional, , other, was, market, N]
[but, who, english, associated, authorization, the, markets, "nt", already, million]
[people, is, butler, in, , loan, recovered, at, had, in]
[do, widely, in, many, such, growth, some, hand, some, fiscal]
["nt", considered, his, minds, agency, offset, ground, , good, N]
[seem, to, , with, , continuing, after, professionals, , and]
[to, be, proceeds, , borrowing, real-estate, stocks, dominated, technical, $]
[be, the, as, fronts, is, loan, began, municipal, factors, N]
[unhappy, father, if, for, unauthorized, losses, to, trading, that, million]
[with, of, the, , and, in, rebound, throughout, would, in]
[it, modern, realistic, and, expensive, the, in, the, have, N]
outputs:
[berlitz, hydro-quebec, banknote, centrust, gitano, cluett, guterman, aer, fromstein, calloway]
[berlitz, centrust, cluett, fromstein, aer, gitano, hydro-quebec, guterman, calloway, banknote]
[banknote, hydro-quebec, calloway, fromstein, berlitz, gitano, cluett, aer, guterman, centrust]
[calloway, berlitz, cluett, centrust, aer, gitano, hydro-quebec, banknote, guterman, fromstein]
[fromstein, hydro-quebec, aer, banknote, gitano, berlitz, calloway, cluett, centrust, guterman]
[calloway, hydro-quebec, guterman, fromstein, berlitz, banknote, cluett, centrust, gitano, aer]
[gitano, fromstein, hydro-quebec, cluett, calloway, centrust, berlitz, guterman, aer, banknote]
[berlitz, gitano, banknote, cluett, calloway, aer, centrust, fromstein, hydro-quebec, guterman]
[calloway, gitano, guterman, berlitz, centrust, hydro-quebec, cluett, aer, fromstein, banknote]
[hydro-quebec, berlitz, fromstein, gitano, cluett, calloway, aer, centrust, guterman, banknote]
[aer, cluett, fromstein, berlitz, guterman, calloway, hydro-quebec, centrust, banknote, gitano]
[cluett, calloway, centrust, fromstein, banknote, gitano, guterman, hydro-quebec, aer, berlitz]
[hydro-quebec, fromstein, calloway, aer, banknote, berlitz, cluett, gitano, centrust, guterman]
[banknote, gitano, aer, centrust, cluett, fromstein, calloway, guterman, hydro-quebec, berlitz]
[calloway, aer, gitano, berlitz, fromstein, cluett, guterman, banknote, hydro-quebec, centrust]
[banknote, cluett, fromstein, berlitz, gitano, aer, centrust, calloway, hydro-quebec, guterman]
[cluett, fromstein, aer, calloway, guterman, banknote, berlitz, gitano, centrust, hydro-quebec]
[aer, guterman, berlitz, gitano, centrust, cluett, calloway, hydro-quebec, fromstein, banknote]
[centrust, fromstein, cluett, berlitz, aer, banknote, guterman, gitano, calloway, hydro-quebec]
[guterman, banknote, fromstein, cluett, gitano, calloway, aer, centrust, berlitz, hydro-quebec]
[calloway, berlitz, aer, banknote, hydro-quebec, fromstein, cluett, guterman, gitano, centrust]
[banknote, hydro-quebec, berlitz, fromstein, guterman, calloway, cluett, centrust, gitano, aer]
[centrust, aer, fromstein, cluett, hydro-quebec, calloway, gitano, berlitz, guterman, banknote]
[fromstein, centrust, aer, banknote, berlitz, guterman, gitano, hydro-quebec, calloway, cluett]
[cluett, banknote, hydro-quebec, gitano, berlitz, fromstein, calloway, guterman, centrust, aer]
As you can see, the number of unique words in the output is rather small. Why is that? Or am I doing it wrong?
The text was updated successfully, but these errors were encountered: