Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add m2m 100 multilingual translation model from FAIR #8054

Closed
sshleifer opened this issue Oct 26, 2020 · 6 comments · Fixed by #10236
Closed

Add m2m 100 multilingual translation model from FAIR #8054

sshleifer opened this issue Oct 26, 2020 · 6 comments · Fixed by #10236
Assignees

Comments

@sshleifer
Copy link
Contributor

Weights, code are available.

Leaving this unassigned in case somebody else wants to take over.

@sshleifer sshleifer changed the title Add m2m 100 model from fairseq Add m2m 100 multilingual translation model from FAIR Oct 26, 2020
@bdalal
Copy link
Contributor

bdalal commented Dec 8, 2020

If it helps, I managed to load the weights from M2M100 - 418M param model to Mbart

from transformers import MBartForConditionalGeneration, MBartConfig, AutoTokenizer, AutoModelForSeq2SeqLM
from fairseq import checkpoint_utils, options, tasks, utils
import torch

with open('418M_last_checkpoint.pt', 'rb') as f:
    state = torch.load(f, map_location=torch.device("cpu"))
state = checkpoint_utils._upgrade_state_dict(state)
args = state['args']
args.fixed_dictionary = "model_dict.128k.txt"
args.source_lang = 'en'
args.target_lang = 'hi'

weights = state['model']
keys = [k for k in weights.keys()]
for key in keys:
    if key.startswith('encoder.') or key.startswith('decoder.'):
        new_key = 'model.' + key
        weights[new_key] = weights[key]
        del weights[key]
weights['model.shared.weight'] = weights['model.encoder.embed_tokens.weight']

config1 = MBartConfig(
    activation_function='relu',
    vocab_size=128112,
    encoder_layerdrop=0.05,
    decoder_layerdrop=0.05,
    attention_dropout=0.1,
    add_final_layer_norm=True,
    normalize_before=True,
    scale_embedding=True,
    static_position_embeddings=True,
    pad_token_id=1,
    bos_token_id=0,
    eos_token_id=2,
    normalize_embedding=True,
    use_cache=False
)
mbart1 = MBartForConditionalGeneration(config1)
mbart1.load_state_dict(weights, strict=False)

This is based on the checkpoint and dictionary provided here.

I also had to replace the position embeddings in modeling_bart with the code from fairseq, because the fairseq implementation of the embeddings seems to be different form the one present in modeling_bart.

Although the weights load successfully it generates random tokens, albeit in the correct language. I have a feeling that there's something going on in fairseq's generate function that is not accounted for here, though I may be wrong.

Would greatly appreciate any ideas you might have to debug the generation aspect.

Hope this helps! Thanks!

@Narsil Narsil self-assigned this Jan 4, 2021
@Narsil Narsil removed their assignment Jan 28, 2021
@patil-suraj patil-suraj self-assigned this Jan 28, 2021
@github-actions
Copy link

github-actions bot commented Mar 6, 2021

This issue has been stale for 1 month.

@patil-suraj
Copy link
Contributor

@ciortanmadalina
Copy link

M2M100 is now integrated!

doc: https://huggingface.co/transformers/master/model_doc/m2m_100.html
models: https://huggingface.co/models?filter=m2m_100

There is a problem with loading the model model = M2M100ForConditionalGeneration.from_pretrained('facebook/m2m100_418M')
produces OSError: Unable to load weights from pytorch checkpoint file for 'facebook/m2m100_418M' at '/root/.cache/huggingface/transformers/f9eabc2ccf1b4ddafac5c7f6dc837130ab7122d75ee98a64ed0a446a20b84871.53192defd013a2942c1d27b5842eba64b84d0e49943b0892c8f71967bf053029'

A manual download of pytorch_model.bin leads to a similar exception, as it produces a zip.

@patil-suraj
Copy link
Contributor

Hi @ciortanmadalina

I just tried this and can load the model successfully. This seems to be the issue with the cache, can you delete the cache and try again?

@ciortanmadalina
Copy link

Hi @ciortanmadalina

I just tried this and can load the model successfully. This seems to be the issue with the cache, can you delete the cache and try again?

I soved it: the problem was not the cache but the pytorch version (1.4), which strangely enough, didn't raise a problem for the other transformer models I used (e.g. T5, Bert). Once I upgraded to 1.7, the issue was gone. Thanks for your answer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants