Update model #17

dbaranchuk · 2022-07-04T19:50:41Z

Default for client's model parameters: requires_grad=False;
Interface: "abstract" class BloomForYou. DistributedBloomForYou inherits from it. All consequent classes for various Bloom applications inherit DistributedBloomForYou;
lm_head -> h @ word_embs for DistributedBloomForCausalLM;
Updated converted_model without lm_head: dbaranchuk/test-bloomd-6b3;
h @ word_embs is very slow in fp16 on CPU. Let's keep fp32 for now. TODO @dbaranchuk: understand why fp16 is slow on CPU. If everything is correct then (TODO @dbaranchuk) keep embeddings in fp16 but calculate fp32 casting on the fly;
Minor bug fix in run_local_server.sh;
Updated tests/test_full_model.py:
Test results for REF_NAME='bigscience/bloom-6b3':
CPU vs GPU (А100) (fp32) passes with (forward_atol=1e-2, inference_atol=1e-2)
CPU vs CPU (fp32) passes with (forward_atol=1e-4, inference_atol=1e-3)

… from comverted_model

Dmitry Baranchuk and others added 5 commits July 4, 2022 21:18

set requires_grad=False, lm_layer -> h @ word_embeddings, rm lm_layer…

d969172

… from comverted_model

set requires_grad=False, lm_layer -> h @ word_embeddings, rm lm_layer…

6a603f9

… from comverted_model

refactoring

be83e6d

design interface & refactoring

e66ab6f

rm debug print

29999a6

dbaranchuk requested a review from justheuristic July 4, 2022 23:03

dbaranchuk merged commit 0b5a689 into main Jul 5, 2022

Provide feedback