Skip to content
This repository has been archived by the owner on Jul 7, 2023. It is now read-only.

T2T 1.4.1 transformer beam search result different with 1.3.2 #525

Open
efeiefei opened this issue Jan 18, 2018 · 6 comments
Open

T2T 1.4.1 transformer beam search result different with 1.3.2 #525

efeiefei opened this issue Jan 18, 2018 · 6 comments
Labels

Comments

@efeiefei
Copy link

I have trained a transformer translate model with t2t 1.3.2.
Now, I want to return every beam search result and socre, so I update t2t version to 1.4.1. I used that model, but got different results in some cases, and the whole bleu decrases.

Can some one help me?

@martinpopel
Copy link
Contributor

martinpopel commented Jan 18, 2018

I have also noticed a huge BLEU drop between T2T versions 1.2.9 and 1.4.2.
In the old version batch_size=1500 had very good results. In the new version the exactly same setup diverges after one hour of training (and BLEU goes down to 0). When I increase batch_size to 2200 it trains OK, but the convergence is much slower.
I plan to find the version when the bug (or whatever) was introduced (but it will take some time, I need to patch some versions with #524 to ensure the same setup).

@prajdabre
Copy link

I can confirm this.

I used v1.1.7 and got a BLEU of 47.66 on my ASPEC Chinese-Japanese task whereas with v1.4 I get 36.87

And as @martinpopel says, the training diverges after a few thousand iterations. Its as if it only looks at a fraction of the data shards and overfits on them.

AFAIK in the new version the default number of shards is 100 and I suspect that it might be the case that the current code only looks at 10 of these shards and overfits.

Anyone else here who has observed such a problem?

@martinpopel
Copy link
Contributor

I found out the bug was introduced in T2T 1.3.0. See the graph below where the upper curve is v1.2.9 and the lower is v1.3.0, all hyperparams are exactly the same.
v129-vs-v130

@prajdabre
Copy link

@martinpopel GG

@martinpopel
Copy link
Contributor

I realize the bug we are discussing now is a different one than the title of this issue and the first post, which is about v1.3.2 vs v1.4.1 problems.
So I created a new issue #529, please continue the discussion about the bug introduced in v1.3.0 there.

@rsepassi rsepassi added the bug label Feb 9, 2018
@rsepassi
Copy link
Contributor

rsepassi commented Feb 9, 2018

Yeah, not good that the beam search deteriorated. Not sure what the issue might be though.

You used the exact same checkpoint?

If you retrained, then the issue that @martinpopel found may be the culprit. If not, then that's a bit mysterious. Would probably mean that some logic in the decode path changed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

4 participants