-
Notifications
You must be signed in to change notification settings - Fork 25.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add m2m 100 multilingual translation model from FAIR #8054
Comments
If it helps, I managed to load the weights from M2M100 - 418M param model to Mbart
This is based on the checkpoint and dictionary provided here. I also had to replace the position embeddings in Although the weights load successfully it generates random tokens, albeit in the correct language. I have a feeling that there's something going on in fairseq's generate function that is not accounted for here, though I may be wrong. Would greatly appreciate any ideas you might have to debug the generation aspect. Hope this helps! Thanks! |
This issue has been stale for 1 month. |
doc: https://huggingface.co/transformers/master/model_doc/m2m_100.html |
There is a problem with loading the model A manual download of pytorch_model.bin leads to a similar exception, as it produces a zip. |
I just tried this and can load the model successfully. This seems to be the issue with the cache, can you delete the cache and try again? |
I soved it: the problem was not the cache but the pytorch version (1.4), which strangely enough, didn't raise a problem for the other transformer models I used (e.g. T5, Bert). Once I upgraded to 1.7, the issue was gone. Thanks for your answer! |
Weights, code are available.
Fairseq Code: https://github.com/pytorch/fairseq/tree/master/examples/m2m_100?fbclid=IwAR304kICXsffdDMogK4MWf4D7Xeu_3Cbmgu8pBCU_jKcjijCuJfLK7CY9_I
Paper: https://arxiv.org/abs/2010.11125
This model will not run on 1 V100 GPU, so model parallelism will be needed.
I would expect the state dict to be very similar to mBART, but not sure yet.
All I've done is download the state dict, run their command, and asked for help m2m: generate OOMs on v100 facebookresearch/fairseq#2772 (comment) when it broke.
Leaving this unassigned in case somebody else wants to take over.
The text was updated successfully, but these errors were encountered: