Add m2m 100 multilingual translation model from FAIR #8054

sshleifer · 2020-10-26T16:17:38Z

Weights, code are available.

Fairseq Code: https://github.com/pytorch/fairseq/tree/master/examples/m2m_100?fbclid=IwAR304kICXsffdDMogK4MWf4D7Xeu_3Cbmgu8pBCU_jKcjijCuJfLK7CY9_I
Paper: https://arxiv.org/abs/2010.11125
This model will not run on 1 V100 GPU, so model parallelism will be needed.
I would expect the state dict to be very similar to mBART, but not sure yet.
All I've done is download the state dict, run their command, and asked for help m2m: generate OOMs on v100 facebookresearch/fairseq#2772 (comment) when it broke.

Leaving this unassigned in case somebody else wants to take over.

bdalal · 2020-12-08T00:57:02Z

If it helps, I managed to load the weights from M2M100 - 418M param model to Mbart

from transformers import MBartForConditionalGeneration, MBartConfig, AutoTokenizer, AutoModelForSeq2SeqLM
from fairseq import checkpoint_utils, options, tasks, utils
import torch

with open('418M_last_checkpoint.pt', 'rb') as f:
    state = torch.load(f, map_location=torch.device("cpu"))
state = checkpoint_utils._upgrade_state_dict(state)
args = state['args']
args.fixed_dictionary = "model_dict.128k.txt"
args.source_lang = 'en'
args.target_lang = 'hi'

weights = state['model']
keys = [k for k in weights.keys()]
for key in keys:
    if key.startswith('encoder.') or key.startswith('decoder.'):
        new_key = 'model.' + key
        weights[new_key] = weights[key]
        del weights[key]
weights['model.shared.weight'] = weights['model.encoder.embed_tokens.weight']

config1 = MBartConfig(
    activation_function='relu',
    vocab_size=128112,
    encoder_layerdrop=0.05,
    decoder_layerdrop=0.05,
    attention_dropout=0.1,
    add_final_layer_norm=True,
    normalize_before=True,
    scale_embedding=True,
    static_position_embeddings=True,
    pad_token_id=1,
    bos_token_id=0,
    eos_token_id=2,
    normalize_embedding=True,
    use_cache=False
)
mbart1 = MBartForConditionalGeneration(config1)
mbart1.load_state_dict(weights, strict=False)

This is based on the checkpoint and dictionary provided here.

I also had to replace the position embeddings in modeling_bart with the code from fairseq, because the fairseq implementation of the embeddings seems to be different form the one present in modeling_bart.

Although the weights load successfully it generates random tokens, albeit in the correct language. I have a feeling that there's something going on in fairseq's generate function that is not accounted for here, though I may be wrong.

Would greatly appreciate any ideas you might have to debug the generation aspect.

Hope this helps! Thanks!

github-actions · 2021-03-06T00:16:49Z

This issue has been stale for 1 month.

patil-suraj · 2021-03-06T16:51:07Z

M2M100 is now integrated!

doc: https://huggingface.co/transformers/master/model_doc/m2m_100.html
models: https://huggingface.co/models?filter=m2m_100

ciortanmadalina · 2021-03-20T10:28:00Z

M2M100 is now integrated!

doc: https://huggingface.co/transformers/master/model_doc/m2m_100.html
models: https://huggingface.co/models?filter=m2m_100

There is a problem with loading the model model = M2M100ForConditionalGeneration.from_pretrained('facebook/m2m100_418M')
produces OSError: Unable to load weights from pytorch checkpoint file for 'facebook/m2m100_418M' at '/root/.cache/huggingface/transformers/f9eabc2ccf1b4ddafac5c7f6dc837130ab7122d75ee98a64ed0a446a20b84871.53192defd013a2942c1d27b5842eba64b84d0e49943b0892c8f71967bf053029'

A manual download of pytorch_model.bin leads to a similar exception, as it produces a zip.

patil-suraj · 2021-03-20T11:03:32Z

Hi @ciortanmadalina

I just tried this and can load the model successfully. This seems to be the issue with the cache, can you delete the cache and try again?

ciortanmadalina · 2021-03-20T12:13:10Z

Hi @ciortanmadalina

I just tried this and can load the model successfully. This seems to be the issue with the cache, can you delete the cache and try again?

I soved it: the problem was not the cache but the pytorch version (1.4), which strangely enough, didn't raise a problem for the other transformer models I used (e.g. T5, Bert). Once I upgraded to 1.7, the issue was gone. Thanks for your answer!

sshleifer added the New model label Oct 26, 2020

sshleifer changed the title ~~Add m2m 100 model from fairseq~~ Add m2m 100 multilingual translation model from FAIR Oct 26, 2020

This was referenced Nov 5, 2020

[Model] M2M-100 Multilingual machine translation #7908

Closed

model parallelism for BART #8344

Closed

Narsil self-assigned this Jan 4, 2021

Narsil removed their assignment Jan 28, 2021

patil-suraj self-assigned this Jan 28, 2021

patil-suraj mentioned this issue Feb 17, 2021

Add m2m100 #10236

Merged

patil-suraj closed this as completed in #10236 Mar 6, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add m2m 100 multilingual translation model from FAIR #8054

Add m2m 100 multilingual translation model from FAIR #8054

sshleifer commented Oct 26, 2020

bdalal commented Dec 8, 2020 •

edited

Loading

github-actions bot commented Mar 6, 2021 •

edited by LysandreJik

Loading

patil-suraj commented Mar 6, 2021

ciortanmadalina commented Mar 20, 2021

patil-suraj commented Mar 20, 2021

ciortanmadalina commented Mar 20, 2021

Add m2m 100 multilingual translation model from FAIR #8054

Add m2m 100 multilingual translation model from FAIR #8054

Comments

sshleifer commented Oct 26, 2020

bdalal commented Dec 8, 2020 • edited Loading

github-actions bot commented Mar 6, 2021 • edited by LysandreJik Loading

patil-suraj commented Mar 6, 2021

ciortanmadalina commented Mar 20, 2021

patil-suraj commented Mar 20, 2021

ciortanmadalina commented Mar 20, 2021

bdalal commented Dec 8, 2020 •

edited

Loading

github-actions bot commented Mar 6, 2021 •

edited by LysandreJik

Loading