Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

finetune & pointer bugs? #26

Open
aykutfirat opened this issue Mar 23, 2018 · 3 comments
Open

finetune & pointer bugs? #26

aykutfirat opened this issue Mar 23, 2018 · 3 comments

Comments

@aykutfirat
Copy link

python finetune.py --epochs 750 --data data/wikitext-2 --save WT2.pt --dropouth 0.2 --seed 1882
python pointer.py --save WT2.pt --lambdasm 0.1279 --theta 0.662 --window 3785 --bptt 2000 --data data/wikitext-2

Traceback (most recent call last):
File "finetune.py", line 183, in
stored_loss = evaluate(val_data)
File "finetune.py", line 108, in evaluate
model.eval()

Looks like model loading & more needs to be modified.

Also, I no longer get the reported ppls in main. LSTM gets stuck around 80s and QRNN around 90s.

@Smerity
Copy link
Contributor

Smerity commented Mar 23, 2018

Hey @aykutfirat,

We've replicated the same issue you're seeing in terms of the initial training performance for ASGD based WT2, in our case using QRNN as it's faster to test. This is as I patched our changes for the Adam based model we used for WT-103, PTBC, and enwik8 over the top of AWD-LSTM-LM but failed to do full testing for regression.

We're hunting down the issue now, initially to fix the standard training and then later to fix the finetune and pointer steps.

@xsway
Copy link

xsway commented Apr 20, 2018

It is probably a related issue, so I thought I would report it here.

When running python main.py --batch_size 20 --data data/penn --dropouti 0.4 --dropouth 0.25 --seed 141 --epoch 500 --save PTB.pt instead of the perplexities 61.2/58.8 I got 70.1 (?!)/58.6. The last lines of the training log below.

| end of epoch 498 | time: 159.11s | valid loss  4.25 | valid ppl    70.08 | valid bpc    6.131
-----------------------------------------------------------------------------------------
| epoch 499 |   200/  663 batches | lr 30.00000 | ms/batch 217.91 | loss  3.69 | ppl    39.95 | bpc    5.320
| epoch 499 |   400/  663 batches | lr 30.00000 | ms/batch 217.03 | loss  3.66 | ppl    38.88 | bpc    5.281
| epoch 499 |   600/  663 batches | lr 30.00000 | ms/batch 218.92 | loss  3.67 | ppl    39.39 | bpc    5.300
-----------------------------------------------------------------------------------------
| end of epoch 499 | time: 159.08s | valid loss  4.25 | valid ppl    70.08 | valid bpc    6.131
-----------------------------------------------------------------------------------------
| epoch 500 |   200/  663 batches | lr 30.00000 | ms/batch 216.38 | loss  3.70 | ppl    40.25 | bpc    5.331
| epoch 500 |   400/  663 batches | lr 30.00000 | ms/batch 216.45 | loss  3.66 | ppl    38.98 | bpc    5.285
| epoch 500 |   600/  663 batches | lr 30.00000 | ms/batch 220.70 | loss  3.68 | ppl    39.60 | bpc    5.308
-----------------------------------------------------------------------------------------
| end of epoch 500 | time: 158.92s | valid loss  4.25 | valid ppl    70.08 | valid bpc    6.131
-----------------------------------------------------------------------------------------
=========================================================================================
| End of training | test loss  4.07 | test ppl    58.56 | test bpc    5.872
=========================================================================================

@keskarnitish
Copy link
Contributor

@xsway I think you're issue is linked to #32
I think everything is working as expected but we're printing the wrong validation loss/perplexity. Could you try patching that change and re-running? I think it should work. I will be running it myself before I merge the changes.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants