[Feature] support Nemotron-4 340B #1784

zhyncs · 2024-06-15T06:49:35Z

Motivation

blog: https://research.nvidia.com/publication/2024-06_nemotron-4-340b
tech report: https://d1qx31qr3h6wln.cloudfront.net/publications/Nemotron_4_340B_8T_0.pdf
hf:
https://huggingface.co/nvidia/Nemotron-4-340B-Base
https://huggingface.co/nvidia/Nemotron-4-340B-Instruct
https://huggingface.co/nvidia/Nemotron-4-340B-Reward

Related resources

No response

Additional context

No response

zhyncs · 2024-06-15T06:53:33Z

The model weights on Hugging Face is a little wired.

zhyncs · 2024-07-03T02:33:18Z

According to the leaderboard at https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard, Nemotron 4 340b Instruct is currently ranked behind Llama 3 70b Instruct. Due to the huge number of weights, a single machine with 8 A100 cannot accommodate it without quantization. The official approach is to use a single machine with 8 H100 in conjunction with fp8 to make it work. vLLM also has support plans and is expected to be quantized through fp8 or Pipeline Parallelism. Currently, both of these methods have relatively long implementation cycles on LMDeploy, and the priority for supporting this model is not so high. It may be considered holding off on this for now.

zhyncs · 2024-07-21T17:44:30Z

Multi node or quant is needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] support Nemotron-4 340B #1784

[Feature] support Nemotron-4 340B #1784

zhyncs commented Jun 15, 2024 •

edited

Loading

zhyncs commented Jun 15, 2024

zhyncs commented Jul 3, 2024

zhyncs commented Jul 21, 2024

[Feature] support Nemotron-4 340B #1784

[Feature] support Nemotron-4 340B #1784

Comments

zhyncs commented Jun 15, 2024 • edited Loading

Motivation

Related resources

Additional context

zhyncs commented Jun 15, 2024

zhyncs commented Jul 3, 2024

zhyncs commented Jul 21, 2024

zhyncs commented Jun 15, 2024 •

edited

Loading