Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLM] support Qwen2 #8338

Merged
merged 51 commits into from
Jun 11, 2024
Merged
Changes from 1 commit
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
36ab9a7
add Qwen2Moe
DrownFish19 Apr 16, 2024
3913e11
update default config
DrownFish19 Apr 17, 2024
0aa1aca
Merge remote-tracking branch 'paddlenlp/develop' into dev_add_qwen1.5…
DrownFish19 Apr 17, 2024
a29e90d
update QWen2Moe modeling
DrownFish19 Apr 18, 2024
d514dff
update modeling
DrownFish19 Apr 18, 2024
1e98323
update ckpt name
DrownFish19 Apr 19, 2024
f81bb43
Merge branch 'PaddlePaddle:develop' into dev_add_qwen1.5-moe
DrownFish19 Apr 22, 2024
37dd2d5
support same prefix model name for auto modeling
DrownFish19 Apr 25, 2024
d12938a
update qwen2moe testing
DrownFish19 Apr 25, 2024
8cc49fc
update qwen2moe modeling and config
DrownFish19 Apr 25, 2024
9c8222e
update qwen2moe import
DrownFish19 Apr 25, 2024
4d6ff87
fix mlp hidden_size
DrownFish19 Apr 25, 2024
f350a2f
update qkv bias convert
DrownFish19 Apr 25, 2024
c53690d
update modeling init_weight
DrownFish19 Apr 25, 2024
9d12995
update _get_name_mappings
DrownFish19 Apr 25, 2024
dba0f74
update _get_name_mappings and _init_weight
DrownFish19 Apr 25, 2024
e487606
add tokenizer
DrownFish19 Apr 26, 2024
cd9c753
update modeling
DrownFish19 Apr 26, 2024
10407c4
update modeling
DrownFish19 Apr 26, 2024
beb0f4c
update tokenizer
DrownFish19 Apr 26, 2024
beefee9
update modeling and tokenizer
DrownFish19 Apr 28, 2024
82ba345
fix index_add_ error
DrownFish19 Apr 28, 2024
d522ee4
Merge branch 'PaddlePaddle:develop' into dev_add_qwen1.5-moe
DrownFish19 Apr 28, 2024
4a1b2e3
fix
DrownFish19 Apr 28, 2024
526a9db
Merge branch 'dev_add_qwen1.5-moe' of github.com:DrownFish19/PaddleNL…
DrownFish19 Apr 28, 2024
0c9d5ec
Merge branch 'PaddlePaddle:develop' into dev_add_qwen1.5-moe
DrownFish19 May 6, 2024
2bb3aba
update comments
DrownFish19 May 6, 2024
f203983
update lora weights
DrownFish19 May 10, 2024
58af3ec
add todo
DrownFish19 May 10, 2024
c766eb5
Merge branch 'PaddlePaddle:develop' into dev_add_qwen1.5-moe
DrownFish19 May 29, 2024
5ddc326
update Copyright
DrownFish19 May 29, 2024
de1db67
update Moe to MoE
DrownFish19 May 29, 2024
10a194c
Merge branch 'PaddlePaddle:develop' into dev_add_qwen1.5-moe
DrownFish19 May 30, 2024
87f0276
update comment
DrownFish19 May 30, 2024
8d9970b
update Copyright
DrownFish19 May 31, 2024
89994a6
Merge branch 'PaddlePaddle:develop' into dev_add_qwen1.5-moe
DrownFish19 Jun 3, 2024
d57a5b1
update readme and json
DrownFish19 Jun 3, 2024
bfb65a1
update __init__.py
DrownFish19 Jun 3, 2024
4b96dd0
add qwen-1.5
DrownFish19 Jun 4, 2024
b274f12
update QWen to Qwen
DrownFish19 Jun 5, 2024
1054f06
update Qwen2MoE to Qwen2Moe
DrownFish19 Jun 5, 2024
056b04c
update readme
DrownFish19 Jun 5, 2024
ab08c17
update qwen2moe sft and lora json
DrownFish19 Jun 5, 2024
ad02fdc
update qwen2moe base name
DrownFish19 Jun 5, 2024
23e39fc
update qwen2
DrownFish19 Jun 7, 2024
36b3897
update
DrownFish19 Jun 7, 2024
6455445
Merge branch 'PaddlePaddle:develop' into dev_add_qwen1.5-moe
DrownFish19 Jun 11, 2024
b140df6
update readme
DrownFish19 Jun 11, 2024
c08c9a6
Merge branch 'dev_add_qwen1.5-moe' of github.com:DrownFish19/PaddleNL…
DrownFish19 Jun 11, 2024
e6de5f3
update readme
DrownFish19 Jun 11, 2024
48ae2ab
update readme
DrownFish19 Jun 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
update comment
  • Loading branch information
DrownFish19 committed May 30, 2024
commit 87f02765be4ff535a69cf2f7dcd4f939db5ab846
20 changes: 0 additions & 20 deletions paddlenlp/transformers/qwen2moe/modeling.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@

try:
from paddle.incubate.nn.functional import fused_rotary_position_embedding
except ImportError:
fused_rotary_position_embedding = None

Check warning on line 34 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L33-L34

Added lines #L33 - L34 were not covered by tests
from paddle.distributed.fleet.utils.sequence_parallel_utils import (
ColumnSequenceParallelLinear,
GatherOp,
Expand All @@ -56,8 +56,8 @@

try:
from paddle.nn.functional.flash_attention import flash_attention
except:
flash_attention = None

Check warning on line 60 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L59-L60

Added lines #L59 - L60 were not covered by tests

__all__ = [
"QWen2MoEModel",
Expand Down Expand Up @@ -87,72 +87,72 @@
Returns:
The auxiliary loss.
"""
if gate_logits is None or not isinstance(gate_logits, tuple):
return 0

Check warning on line 91 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L90-L91

Added lines #L90 - L91 were not covered by tests

if isinstance(gate_logits, tuple):
concatenated_gate_logits = paddle.concat(

Check warning on line 94 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L93-L94

Added lines #L93 - L94 were not covered by tests
gate_logits, axis=0
) # [num_hidden_layers X batch_size X sequence_length, num_experts]

routing_weights = F.softmax(concatenated_gate_logits, axis=-1)
_, selected_experts = paddle.topk(routing_weights, top_k, axis=-1)
expert_mask = F.one_hot(

Check warning on line 100 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L98-L100

Added lines #L98 - L100 were not covered by tests
selected_experts, num_classes=num_experts
) # [num_hidden_layers X batch_size X sequence_length, top_k, num_experts]

if attention_mask is None or len(attention_mask.shape) == 4:

Check warning on line 104 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L104

Added line #L104 was not covered by tests
# Only intokens strategy has 4-D attention_mask, we currently do not support excluding padding tokens.
# Compute the percentage of tokens routed to each experts
tokens_per_expert = paddle.mean(expert_mask.astype("float32"), axis=0)

Check warning on line 107 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L107

Added line #L107 was not covered by tests

# Compute the average probability of routing to these experts
router_prob_per_expert = paddle.mean(routing_weights, axis=0)

Check warning on line 110 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L110

Added line #L110 was not covered by tests
else:
# Exclude the load balancing loss of padding tokens.
if len(attention_mask.shape) == 2:
batch_size, sequence_length = attention_mask.shape
num_hidden_layers = concatenated_gate_logits.shape[0] // (batch_size * sequence_length)

Check warning on line 115 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L113-L115

Added lines #L113 - L115 were not covered by tests

# Compute the mask that masks all padding tokens as 0 with the same shape of expert_mask
expert_attention_mask = (

Check warning on line 118 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L118

Added line #L118 was not covered by tests
attention_mask[None, :, :, None, None]
.expand((num_hidden_layers, batch_size, sequence_length, top_k, num_experts))
.reshape([-1, top_k, num_experts])
) # [num_hidden_layers * batch_size * sequence_length, top_k, num_experts]

# Compute the percentage of tokens routed to each experts
tokens_per_expert = paddle.sum(expert_mask.astype("float32") * expert_attention_mask, axis=0) / paddle.sum(

Check warning on line 125 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L125

Added line #L125 was not covered by tests
expert_attention_mask, axis=0
)

# Compute the mask that masks all padding tokens as 0 with the same shape of tokens_per_expert
router_per_expert_attention_mask = (

Check warning on line 130 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L130

Added line #L130 was not covered by tests
attention_mask[None, :, :, None]
.expand((num_hidden_layers, batch_size, sequence_length, num_experts))
.reshape([-1, num_experts])
)

# Compute the average probability of routing to these experts
router_prob_per_expert = paddle.sum(

Check warning on line 137 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L137

Added line #L137 was not covered by tests
routing_weights * router_per_expert_attention_mask, axis=0
) / paddle.sum(router_per_expert_attention_mask, axis=0)

overall_loss = paddle.sum(tokens_per_expert * router_prob_per_expert.unsqueeze(0))
return overall_loss * num_experts

Check warning on line 142 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L141-L142

Added lines #L141 - L142 were not covered by tests


def get_triangle_upper_mask(x, mask=None):
if mask is not None:
return mask

Check warning on line 147 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L146-L147

Added lines #L146 - L147 were not covered by tests
# [bsz, n_head, q_len, kv_seq_len]
shape = x.shape

Check warning on line 149 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L149

Added line #L149 was not covered by tests
# [bsz, 1, q_len, kv_seq_len]
shape[1] = 1
mask = paddle.full(shape, paddle.finfo(x.dtype).min, dtype=x.dtype)
mask = paddle.triu(mask, diagonal=1)
mask.stop_gradient = True
return mask

Check warning on line 155 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L151-L155

Added lines #L151 - L155 were not covered by tests


def assign_kv_heads(num_kv_heads: int, num_gpus: int):
Expand All @@ -168,20 +168,20 @@
assign_kv_heads(num_kv_heads=2, num_gpus=4): [[0],[0],[1],[1]]
assign_kv_heads(num_kv_heads=4, num_gpus=4): [[0],[1],[2],[3]]
"""
assignment_list = [[] for _ in range(num_gpus)]

Check warning on line 171 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L171

Added line #L171 was not covered by tests
# Case 1: more heads than cards
if num_kv_heads > num_gpus:
num_heads_per_card = num_kv_heads // num_gpus
for i in range(num_gpus):
for j in range(num_heads_per_card):
assignment_list[i].append(i * num_heads_per_card + j)

Check warning on line 177 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L173-L177

Added lines #L173 - L177 were not covered by tests
# Case 2: more cards than heads. each card get only 1 head.
else:
num_card_per_heads = num_gpus // num_kv_heads
for i in range(num_kv_heads):
for j in range(num_card_per_heads):
assignment_list[i * num_card_per_heads + j].append(i)
return assignment_list

Check warning on line 184 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L180-L184

Added lines #L180 - L184 were not covered by tests


def parallel_matmul(x: Tensor, y: Tensor, tensor_parallel_output=True):
Expand All @@ -189,25 +189,25 @@
tensor_parallel_degree = 1
try:
hcg = fleet.get_hybrid_communicate_group()
model_parallel_group = hcg.get_model_parallel_group()
tensor_parallel_degree = hcg.get_model_parallel_world_size()

Check warning on line 193 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L192-L193

Added lines #L192 - L193 were not covered by tests
except:
is_fleet_init = False

if paddle.in_dynamic_mode():
y_is_distributed = y.is_distributed
else:
y_is_distributed = tensor_parallel_degree > 1

Check warning on line 200 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L200

Added line #L200 was not covered by tests

if is_fleet_init and tensor_parallel_degree > 1 and y_is_distributed:
# if not running under distributed.launch, it will raise AttributeError: 'Fleet' object has no attribute '_hcg'
input_parallel = paddle.distributed.collective._c_identity(x, group=model_parallel_group)
logits = paddle.matmul(input_parallel, y, transpose_y=False)

Check warning on line 205 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L204-L205

Added lines #L204 - L205 were not covered by tests

if tensor_parallel_output:
return logits

Check warning on line 208 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L207-L208

Added lines #L207 - L208 were not covered by tests

return paddle.distributed.collective._c_concat(logits, group=model_parallel_group)

Check warning on line 210 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L210

Added line #L210 was not covered by tests

else:
logits = paddle.matmul(x, y, transpose_y=False)
Expand All @@ -231,9 +231,9 @@
# Paddle Flash Attention input [ bz, seqlen, nhead, head_dim]
# Torch Flash Attention input [ bz, nhead, seqlen, head_dim]

version = paddle.version.full_version
if version != "0.0.0" and version <= "2.5.2":
attn_output, attn_weights = flash_attention(

Check warning on line 236 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L234-L236

Added lines #L234 - L236 were not covered by tests
query_states,
key_states,
value_states,
Expand All @@ -241,7 +241,7 @@
return_softmax=output_attentions,
)
else:
attn_output = F.scaled_dot_product_attention(

Check warning on line 244 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L244

Added line #L244 was not covered by tests
query_states,
key_states,
value_states,
Expand All @@ -250,13 +250,13 @@
dropout_p=config.attention_dropout if training else 0.0,
training=training,
)
attn_weights = None

Check warning on line 253 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L253

Added line #L253 was not covered by tests

if sequence_parallel:
attn_output = attn_output.reshape([bsz * q_len, head_dim * num_heads])

Check warning on line 256 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L255-L256

Added lines #L255 - L256 were not covered by tests
else:
attn_output = attn_output.reshape([bsz, q_len, head_dim * num_heads])
return (attn_output, attn_weights) if output_attentions else attn_output

Check warning on line 259 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L258-L259

Added lines #L258 - L259 were not covered by tests
else:
# [ bz, seqlen, nhead, head_dim] -> [bs, nhead, seq_len, head_dim]
query_states = paddle.transpose(query_states, [0, 2, 1, 3])
Expand All @@ -268,22 +268,22 @@
attn_weights = paddle.matmul(query_states / math.sqrt(head_dim), key_states.transpose([0, 1, 3, 2]))

if attn_weights.shape != [bsz, num_heads, q_len, kv_seq_len]:
raise ValueError(

Check warning on line 271 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L271

Added line #L271 was not covered by tests
f"Attention weights should be of shape {(bsz, num_heads, q_len, kv_seq_len)}, but is"
f" {attn_weights.shape}"
)

if attention_mask is None:
attention_mask = get_triangle_upper_mask(attn_weights)

Check warning on line 277 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L277

Added line #L277 was not covered by tests
attention_mask = attention_mask.reshape([bsz, 1, q_len, kv_seq_len])
if attention_mask.shape != [bsz, 1, q_len, kv_seq_len]:
raise ValueError(

Check warning on line 280 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L280

Added line #L280 was not covered by tests
f"Attention mask should be of shape {(bsz, 1, q_len, kv_seq_len)}, but is {attention_mask.shape}"
)

attn_weights = attn_weights + attention_mask
if not paddle.in_dynamic_mode():
attn_weights = F.softmax(attn_weights, axis=-1, dtype="float32").astype(query_states.dtype)

Check warning on line 286 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L286

Added line #L286 was not covered by tests
else:
with paddle.amp.auto_cast(False):
attn_weights = F.softmax(attn_weights, axis=-1, dtype="float32").astype(query_states.dtype)
Expand All @@ -294,22 +294,22 @@
attn_output = attn_output.transpose([0, 2, 1, 3])

if sequence_parallel:
attn_output = attn_output.reshape([bsz * q_len, head_dim * num_heads])

Check warning on line 297 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L297

Added line #L297 was not covered by tests
else:
attn_output = attn_output.reshape([bsz, q_len, head_dim * num_heads])
return (attn_output, attn_weights) if output_attentions else attn_output


def masked_fill(x, mask, value):
y = paddle.full(x.shape, value, x.dtype)
return paddle.where(mask, y, x)

Check warning on line 305 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L304-L305

Added lines #L304 - L305 were not covered by tests


def is_casual_mask(attention_mask):
"""
Upper triangular of attention_mask equals to attention_mask is casual
"""
return (paddle.triu(attention_mask) == attention_mask).all().item()

Check warning on line 312 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L312

Added line #L312 was not covered by tests


def _make_causal_mask(input_ids_shape, past_key_values_length):
Expand All @@ -322,7 +322,7 @@

if past_key_values_length > 0:
# [tgt_len, tgt_len + past_len]
mask = paddle.concat([paddle.ones([target_length, past_key_values_length], dtype="bool"), mask], axis=-1)

Check warning on line 325 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L325

Added line #L325 was not covered by tests

# [bs, 1, tgt_len, tgt_len + past_len]
return mask[None, None, :, :].expand([batch_size, 1, target_length, target_length + past_key_values_length])
Expand Down Expand Up @@ -355,7 +355,7 @@
self.config = config

if config.sequence_parallel:
mark_as_sequence_parallel_parameter(self.weight)

Check warning on line 358 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L358

Added line #L358 was not covered by tests

def forward(self, hidden_states):
if paddle.in_dynamic_mode():
Expand All @@ -364,12 +364,12 @@
variance = hidden_states.pow(2).mean(-1, keepdim=True)
hidden_states = paddle.rsqrt(variance + self.variance_epsilon) * hidden_states
else:
hidden_states = hidden_states.astype("float32")
variance = hidden_states.pow(2).mean(-1, keepdim=True)
hidden_states = paddle.rsqrt(variance + self.variance_epsilon) * hidden_states

Check warning on line 369 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L367-L369

Added lines #L367 - L369 were not covered by tests

if self.weight.dtype in [paddle.float16, paddle.bfloat16]:
hidden_states = paddle.cast(hidden_states, self.weight.dtype)

Check warning on line 372 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L372

Added line #L372 was not covered by tests
return hidden_states * self.weight


Expand Down Expand Up @@ -399,7 +399,7 @@
def forward(self, x, seq_len=None):
# x: [bs, num_attention_heads, seq_len, head_size]
if seq_len > self.max_seq_len_cached:
self._set_cos_sin_cache(seq_len)

Check warning on line 402 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L402

Added line #L402 was not covered by tests
cos = self.cos_cached[:, :seq_len, :, :]
sin = self.sin_cached[:, :seq_len, :, :]
return (
Expand All @@ -416,30 +416,10 @@


def apply_rotary_pos_emb(q, k, cos, sin, position_ids):
"""Applies Rotary Position Embedding to the query and key tensors.

Args:
q (`torch.Tensor`): The query tensor.
k (`torch.Tensor`): The key tensor.
cos (`torch.Tensor`): The cosine part of the rotary embedding.
sin (`torch.Tensor`): The sine part of the rotary embedding.
position_ids (`torch.Tensor`):
The position indices of the tokens corresponding to the query and key tensors. For example, this can be
used to pass offsetted position ids when working with a KV-cache.
unsqueeze_dim (`int`, *optional*, defaults to 1):
The 'unsqueeze_dim' argument specifies the dimension along which to unsqueeze cos[position_ids] and
sin[position_ids] so that they can be properly broadcasted to the dimensions of q and k. For example, note
that cos[position_ids] and sin[position_ids] have the shape [batch_size, seq_len, head_dim]. Then, if q and
k have the shape [batch_size, heads, seq_len, head_dim], then setting unsqueeze_dim=1 makes
cos[position_ids] and sin[position_ids] broadcastable to the shapes of q and k. Similarly, if q and k have
the shape [batch_size, seq_len, heads, head_dim], then set unsqueeze_dim=2.
Returns:
`tuple(torch.Tensor)` comprising of the query and key tensors rotated using the Rotary Position Embedding.
"""
if position_ids is None:
# Note: Only for QWen2MoEForCausalLMPipe model pretraining
cos = cos[:, : q.shape[1], :, :] # [bs, seq_len, 1, dim]
sin = sin[:, : q.shape[1], :, :] # [bs, seq_len, 1, dim]

Check warning on line 422 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L421-L422

Added lines #L421 - L422 were not covered by tests
else:
cos = cos.squeeze(axis=[0, 2]) # [seq_len, dim]
sin = sin.squeeze(axis=[0, 2]) # [seq_len, dim]
Expand All @@ -461,26 +441,26 @@
self.tensor_parallel_degree = config.tensor_parallel_degree

if config.sequence_parallel:
ColumnParallelLinear = ColumnSequenceParallelLinear
RowParallelLinear = RowSequenceParallelLinear

Check warning on line 445 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L444-L445

Added lines #L444 - L445 were not covered by tests
else:
ColumnParallelLinear = fleet.meta_parallel.ColumnParallelLinear
RowParallelLinear = fleet.meta_parallel.RowParallelLinear

if config.tensor_parallel_degree > 1:
self.gate_proj = ColumnParallelLinear(

Check warning on line 451 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L451

Added line #L451 was not covered by tests
self.hidden_size,
self.intermediate_size,
gather_output=False,
has_bias=False,
)
self.up_proj = ColumnParallelLinear(

Check warning on line 457 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L457

Added line #L457 was not covered by tests
self.hidden_size,
self.intermediate_size,
gather_output=False,
has_bias=False,
)
self.down_proj = RowParallelLinear(

Check warning on line 463 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L463

Added line #L463 was not covered by tests
self.intermediate_size,
self.hidden_size,
input_is_parallel=True,
Expand All @@ -506,8 +486,8 @@
if n_rep == 1:
return hidden_states

hidden_states = hidden_states.unsqueeze(-2).tile([1, 1, 1, n_rep, 1])
return hidden_states.reshape([batch, slen, num_key_value_heads * n_rep, head_dim])

Check warning on line 490 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L489-L490

Added lines #L489 - L490 were not covered by tests


class QWen2MoEAttention(nn.Layer):
Expand Down Expand Up @@ -543,41 +523,41 @@
self.layerwise_recompute = layerwise_recompute
self.recompute_granularity = config.recompute_granularity
if config.tensor_parallel_degree > 1:
assert (

Check warning on line 526 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L526

Added line #L526 was not covered by tests
self.num_heads % config.tensor_parallel_degree == 0
), f"num_heads: {self.num_heads}, tensor_parallel_degree: {config.tensor_parallel_degree}"
self.num_heads = self.num_heads // config.tensor_parallel_degree

Check warning on line 529 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L529

Added line #L529 was not covered by tests

assert (

Check warning on line 531 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L531

Added line #L531 was not covered by tests
self.num_key_value_heads % config.tensor_parallel_degree == 0
), f"num_key_value_heads: {self.num_key_value_heads}, tensor_parallel_degree: {config.tensor_parallel_degree}"
self.num_key_value_heads = self.num_key_value_heads // config.tensor_parallel_degree

Check warning on line 534 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L534

Added line #L534 was not covered by tests

self.use_fused_rope = config.use_fused_rope
if self.use_fused_rope:
if "gpu" not in paddle.device.get_device() or fused_rotary_position_embedding is None:
warnings.warn(

Check warning on line 539 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L538-L539

Added lines #L538 - L539 were not covered by tests
"Enable fuse rope in the config, but fuse rope is not available. "
"Will disable fuse rope. Try using latest gpu version of Paddle."
)
self.use_fused_rope = False

Check warning on line 543 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L543

Added line #L543 was not covered by tests

if config.sequence_parallel:
ColumnParallelLinear = ColumnSequenceParallelLinear
RowParallelLinear = RowSequenceParallelLinear

Check warning on line 547 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L546-L547

Added lines #L546 - L547 were not covered by tests
else:
ColumnParallelLinear = fleet.meta_parallel.ColumnParallelLinear
RowParallelLinear = fleet.meta_parallel.RowParallelLinear

if config.tensor_parallel_degree > 1:
self.q_proj = ColumnParallelLinear(self.hidden_size, self.hidden_size, has_bias=True, gather_output=False)
self.k_proj = ColumnParallelLinear(

Check warning on line 554 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L553-L554

Added lines #L553 - L554 were not covered by tests
self.hidden_size, self.config.num_key_value_heads * self.head_dim, has_bias=True, gather_output=False
)
self.v_proj = ColumnParallelLinear(

Check warning on line 557 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L557

Added line #L557 was not covered by tests
self.hidden_size, self.config.num_key_value_heads * self.head_dim, has_bias=True, gather_output=False
)
self.o_proj = RowParallelLinear(self.hidden_size, self.hidden_size, has_bias=False, input_is_parallel=True)

Check warning on line 560 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L560

Added line #L560 was not covered by tests
else:
self.q_proj = nn.Linear(self.hidden_size, self.hidden_size, bias_attr=True)
self.k_proj = nn.Linear(self.hidden_size, self.config.num_key_value_heads * self.head_dim, bias_attr=True)
Expand Down Expand Up @@ -610,8 +590,8 @@
value_states = self.v_proj(hidden_states)

if self.sequence_parallel:
target_query_shape = [-1, self.seq_length, self.num_heads, self.head_dim]
target_key_value_shape = [-1, self.seq_length, self.num_key_value_heads, self.head_dim]

Check warning on line 594 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L593-L594

Added lines #L593 - L594 were not covered by tests
else:
target_query_shape = [0, 0, self.num_heads, self.head_dim]
target_key_value_shape = [0, 0, self.num_key_value_heads, self.head_dim]
Expand All @@ -625,9 +605,9 @@
kv_seq_len += past_key_value[0].shape[-3]

if self.use_fused_rope:
assert past_key_value is None, "fuse rotary not support cache kv for now"
cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
query_states, key_states, _ = fused_rotary_position_embedding(

Check warning on line 610 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L608-L610

Added lines #L608 - L610 were not covered by tests
query_states,
key_states,
v=None,
Expand Down Expand Up @@ -658,7 +638,7 @@
and has_gradient
and self.recompute_granularity == "core_attn"
):
outputs = recompute(

Check warning on line 641 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L641

Added line #L641 was not covered by tests
scaled_dot_product_attention,
query_states,
self.config,
Expand Down Expand Up @@ -730,7 +710,7 @@
routing_weights = F.softmax(router_logits.astype("float32"), axis=1)
routing_weights, selected_experts = paddle.topk(routing_weights, self.top_k, axis=-1)
if self.norm_topk_prob: # Note: Mixtral is set norm as default, QWen2MoE is set to no norm
routing_weights /= routing_weights.sum(axis=-1, keepdim=True)

Check warning on line 713 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L713

Added line #L713 was not covered by tests
# we cast back to input dtype
routing_weights = routing_weights.astype(hidden_states.dtype)

Expand Down Expand Up @@ -782,7 +762,7 @@
self.mlp = QWen2MoESparseMoEBlock(config)
else:
# num_experts == 0 or this layer is not sparse layer
self.mlp = QWen2MoEMLP(config)

Check warning on line 765 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L765

Added line #L765 was not covered by tests

self.input_layernorm = QWen2MoERMSNorm(config)
self.post_attention_layernorm = QWen2MoERMSNorm(config)
Expand Down Expand Up @@ -835,7 +815,7 @@
and has_gradient
and self.recompute_granularity == "full_attn"
):
outputs = recompute(

Check warning on line 818 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L818

Added line #L818 was not covered by tests
self.self_attn,
hidden_states,
position_ids,
Expand Down Expand Up @@ -875,7 +855,7 @@
if isinstance(hidden_states, tuple):
hidden_states, router_logits = hidden_states
else:
router_logits = None

Check warning on line 858 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L858

Added line #L858 was not covered by tests

hidden_states = residual + hidden_states

Expand All @@ -888,7 +868,7 @@
outputs += (present_key_value,)

if output_router_logits:
outputs += (router_logits,)

Check warning on line 871 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L871

Added line #L871 was not covered by tests

if type(outputs) is tuple and len(outputs) == 1:
outputs = outputs[0]
Expand Down Expand Up @@ -950,80 +930,80 @@

@classmethod
def _get_tensor_parallel_mappings(cls, config: QWen2MoEConfig, is_split=True):
from paddlenlp.transformers.conversion_utils import split_or_merge_func

Check warning on line 933 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L933

Added line #L933 was not covered by tests

fn = split_or_merge_func(

Check warning on line 935 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L935

Added line #L935 was not covered by tests
is_split=is_split,
tensor_parallel_degree=config.tensor_parallel_degree,
tensor_parallel_rank=config.tensor_parallel_rank,
num_attention_heads=config.num_attention_heads,
)

def get_tensor_parallel_split_mappings(num_layers, num_experts):
final_actions = {}

Check warning on line 943 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L942-L943

Added lines #L942 - L943 were not covered by tests

base_actions = {

Check warning on line 945 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L945

Added line #L945 was not covered by tests
"lm_head.weight": partial(fn, is_column=True),
# Row Linear
"embed_tokens.weight": partial(fn, is_column=False),
"layers.0.self_attn.o_proj.weight": partial(fn, is_column=False),
}

if not config.vocab_size % config.tensor_parallel_degree == 0:
base_actions.pop("lm_head.weight")
base_actions.pop("embed_tokens.weight")

Check warning on line 954 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L952-L954

Added lines #L952 - L954 were not covered by tests

# Column Linear
base_actions["layers.0.self_attn.q_proj.weight"] = partial(fn, is_column=True)
base_actions["layers.0.self_attn.q_proj.bias"] = partial(fn, is_column=True)

Check warning on line 958 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L957-L958

Added lines #L957 - L958 were not covered by tests
# if we have enough num_key_value_heads to split, then split it.
if config.num_key_value_heads % config.tensor_parallel_degree == 0:
base_actions["layers.0.self_attn.k_proj.weight"] = partial(fn, is_column=True)
base_actions["layers.0.self_attn.v_proj.weight"] = partial(fn, is_column=True)
base_actions["layers.0.self_attn.k_proj.bias"] = partial(fn, is_column=True)
base_actions["layers.0.self_attn.v_proj.bias"] = partial(fn, is_column=True)

Check warning on line 964 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L960-L964

Added lines #L960 - L964 were not covered by tests

for key, action in base_actions.items():
if "layers.0." in key:
for i in range(num_layers):
final_actions[key.replace("layers.0.", f"layers.{i}.")] = action
final_actions[key] = action

Check warning on line 970 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L966-L970

Added lines #L966 - L970 were not covered by tests

# Add tp split for expert params.
base_actions = {

Check warning on line 973 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L973

Added line #L973 was not covered by tests
"layers.0.mlp.experts.0.gate_proj.weight": partial(fn, is_column=True),
"layers.0.mlp.experts.0.down_proj.weight": partial(fn, is_column=False),
"layers.0.mlp.experts.0.up_proj.weight": partial(fn, is_column=True),
}
for key, action in base_actions.items():
for i in range(num_layers):
newkey = key.replace("layers.0.", f"layers.{i}.")
for j in range(num_experts):
newkey2 = newkey.replace("experts.0.", f"experts.{j}.")
final_actions[newkey2] = action

Check warning on line 983 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L978-L983

Added lines #L978 - L983 were not covered by tests

# Add tp split for shared expert params.
base_actions = {

Check warning on line 986 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L986

Added line #L986 was not covered by tests
"layers.0.mlp.shared_expert.gate_proj.weight": partial(fn, is_column=True),
"layers.0.mlp.shared_expert.up_proj.weight": partial(fn, is_column=True),
"layers.0.mlp.shared_expert.down_proj.weight": partial(fn, is_column=False),
}
for key, action in base_actions.items():
if "layers.0." in key:
for i in range(num_layers):
final_actions[key.replace("layers.0.", f"layers.{i}.")] = action
final_actions[key] = action

Check warning on line 995 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L991-L995

Added lines #L991 - L995 were not covered by tests

return final_actions

Check warning on line 997 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L997

Added line #L997 was not covered by tests

mappings = get_tensor_parallel_split_mappings(config.num_hidden_layers, config.num_experts)

Check warning on line 999 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L999

Added line #L999 was not covered by tests

return mappings

Check warning on line 1001 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1001

Added line #L1001 was not covered by tests

def _init_weights(self, layer):
"""Initialization hook"""
if self.config.tensor_parallel_degree > 1:
rng_tracker = get_rng_state_tracker().rng_state

Check warning on line 1006 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1006

Added line #L1006 was not covered by tests
if isinstance(
layer,
(
Expand All @@ -1041,8 +1021,8 @@
# and reset the `state_dict` to update parameter in static mode.
if isinstance(layer.weight, paddle.Tensor):
if layer.weight.is_distributed:
with rng_tracker():
layer.weight.set_value(

Check warning on line 1025 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1024-L1025

Added lines #L1024 - L1025 were not covered by tests
paddle.tensor.normal(
mean=0.0,
std=self.config.initializer_range
Expand Down Expand Up @@ -1095,7 +1075,7 @@
# Recompute defaults to False and is controlled by Trainer
self.enable_recompute = False
if config.tensor_parallel_degree > 1 and config.vocab_size % config.tensor_parallel_degree == 0:
self.embed_tokens = mpu.VocabParallelEmbedding(

Check warning on line 1078 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1078

Added line #L1078 was not covered by tests
self.vocab_size,
self.hidden_size,
weight_attr=paddle.ParamAttr(initializer=nn.initializer.XavierNormal()),
Expand Down Expand Up @@ -1140,7 +1120,7 @@
else:
expanded_attn_mask = attention_mask
else:
expanded_attn_mask = _make_causal_mask(

Check warning on line 1123 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1123

Added line #L1123 was not covered by tests
input_shape,
past_key_values_length=past_key_values_length,
)
Expand All @@ -1160,13 +1140,13 @@
past_key_value: Tensor,
use_cache: bool,
):
def create_custom_forward(module):
def custom_forward(*inputs):
return module(*inputs)

Check warning on line 1145 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1143-L1145

Added lines #L1143 - L1145 were not covered by tests

return custom_forward

Check warning on line 1147 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1147

Added line #L1147 was not covered by tests

hidden_states = recompute(

Check warning on line 1149 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1149

Added line #L1149 was not covered by tests
create_custom_forward(layer_module),
hidden_states,
position_ids,
Expand All @@ -1178,7 +1158,7 @@
use_reentrant=self.config.recompute_use_reentrant,
)

return hidden_states

Check warning on line 1161 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1161

Added line #L1161 was not covered by tests

def forward(
self,
Expand All @@ -1195,7 +1175,7 @@
**kwargs,
):
if self.sequence_parallel and use_cache:
raise ValueError("We currently only support sequence parallel without cache.")

Check warning on line 1178 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1178

Added line #L1178 was not covered by tests

output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions

Expand All @@ -1211,13 +1191,13 @@

# retrieve input_ids and inputs_embeds
if input_ids is not None and inputs_embeds is not None:
raise ValueError("You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time")

Check warning on line 1194 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1194

Added line #L1194 was not covered by tests
elif input_ids is not None:
batch_size, seq_length = input_ids.shape
elif inputs_embeds is not None:
batch_size, seq_length, _ = inputs_embeds.shape

Check warning on line 1198 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1197-L1198

Added lines #L1197 - L1198 were not covered by tests
else:
raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds")

Check warning on line 1200 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1200

Added line #L1200 was not covered by tests

if past_key_values is None:
past_key_values = tuple([None] * len(self.layers))
Expand All @@ -1234,10 +1214,10 @@

if self.sequence_parallel:
# [bs, seq_len, num_head * head_dim] -> [bs * seq_len, num_head * head_dim]
bs, seq_len, hidden_size = inputs_embeds.shape
inputs_embeds = paddle.reshape_(inputs_embeds, [bs * seq_len, hidden_size])

Check warning on line 1218 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1217-L1218

Added lines #L1217 - L1218 were not covered by tests
# [seq_len * bs / n, num_head * head_dim] (n is mp parallelism)
inputs_embeds = ScatterOp.apply(inputs_embeds)

Check warning on line 1220 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1220

Added line #L1220 was not covered by tests

# embed positions
if attention_mask is None:
Expand All @@ -1251,9 +1231,9 @@
attention_mask, (batch_size, seq_length), cache_length, inputs_embeds.dtype
) # [bs, 1, seq_len, seq_len]
if self.config.use_flash_attention:
is_casual = is_casual_mask(attention_mask)
if is_casual:
attention_mask = None

Check warning on line 1236 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1234-L1236

Added lines #L1234 - L1236 were not covered by tests
hidden_states = inputs_embeds

# decoder layers
Expand All @@ -1274,7 +1254,7 @@
and has_gradient
and self.recompute_granularity == "full"
):
layer_outputs = self.recompute_training_full(

Check warning on line 1257 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1257

Added line #L1257 was not covered by tests
decoder_layer,
hidden_states,
position_ids,
Expand Down Expand Up @@ -1309,7 +1289,7 @@
next_decoder_cache += (layer_outputs[2 if output_attentions else 1],)

if output_router_logits:
all_router_logits += (layer_outputs[-1],)

Check warning on line 1292 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1292

Added line #L1292 was not covered by tests

hidden_states = self.norm(hidden_states)

Expand Down Expand Up @@ -1347,26 +1327,26 @@
self.enable_parallel_cross_entropy = config.tensor_parallel_degree > 1 and config.tensor_parallel_output

if self.enable_parallel_cross_entropy: # and False: # and lm_head is distributed
self.loss_func = mpu.ParallelCrossEntropy(ignore_index=self.ignore_index)

Check warning on line 1330 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1330

Added line #L1330 was not covered by tests
else:
self.loss_func = paddle.nn.CrossEntropyLoss(reduction="none", ignore_index=self.ignore_index)

def forward(self, prediction_scores, masked_lm_labels):
if self.enable_parallel_cross_entropy:
if prediction_scores.shape[-1] == self.config.vocab_size:
warnings.warn(

Check warning on line 1337 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1335-L1337

Added lines #L1335 - L1337 were not covered by tests
f"enable_parallel_cross_entropy, the vocab_size should be splited: {prediction_scores.shape[-1]}, {self.config.vocab_size}"
)
self.loss_func = paddle.nn.CrossEntropyLoss(reduction="none", ignore_index=self.ignore_index)

Check warning on line 1340 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1340

Added line #L1340 was not covered by tests

with paddle.amp.auto_cast(False):
masked_lm_loss = self.loss_func(prediction_scores.astype("float32"), masked_lm_labels.unsqueeze(2))

Check warning on line 1343 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1342-L1343

Added lines #L1342 - L1343 were not covered by tests

# skip ignore_index which loss == 0
masked_lm_loss = masked_lm_loss[masked_lm_loss > 0]
loss = paddle.mean(masked_lm_loss)

Check warning on line 1347 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1346-L1347

Added lines #L1346 - L1347 were not covered by tests

return loss

Check warning on line 1349 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1349

Added line #L1349 was not covered by tests


class QWen2MoELMHead(nn.Layer):
Expand All @@ -1374,7 +1354,7 @@
super(QWen2MoELMHead, self).__init__()
self.config = config
if config.tensor_parallel_degree > 1 and config.vocab_size % config.tensor_parallel_degree == 0:
vocab_size = config.vocab_size // config.tensor_parallel_degree

Check warning on line 1357 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1357

Added line #L1357 was not covered by tests
else:
vocab_size = config.vocab_size

Expand All @@ -1385,16 +1365,16 @@
# Must set distributed attr for Tensor Parallel !
self.weight.is_distributed = True if (vocab_size != config.vocab_size) else False
if self.weight.is_distributed:
self.weight.split_axis = 1

Check warning on line 1368 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1368

Added line #L1368 was not covered by tests

def forward(self, hidden_states, tensor_parallel_output=None):
if self.config.sequence_parallel:
hidden_states = GatherOp.apply(hidden_states)
seq_length = self.config.seq_length
hidden_states = paddle.reshape_(hidden_states, [-1, seq_length, self.config.hidden_size])

Check warning on line 1374 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1372-L1374

Added lines #L1372 - L1374 were not covered by tests

if tensor_parallel_output is None:
tensor_parallel_output = self.config.tensor_parallel_output

Check warning on line 1377 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1377

Added line #L1377 was not covered by tests

logits = parallel_matmul(hidden_states, self.weight, tensor_parallel_output=tensor_parallel_output)
return logits
Expand Down Expand Up @@ -1427,16 +1407,16 @@
self.qwen2moe.embed_tokens = value

def get_output_embeddings(self):
return self.lm_head

Check warning on line 1410 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1410

Added line #L1410 was not covered by tests

def set_output_embeddings(self, new_embeddings):
self.lm_head = new_embeddings

Check warning on line 1413 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1413

Added line #L1413 was not covered by tests

def set_decoder(self, decoder):
self.qwen2moe = decoder

Check warning on line 1416 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1416

Added line #L1416 was not covered by tests

def get_decoder(self):
return self.qwen2moe

Check warning on line 1419 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1419

Added line #L1419 was not covered by tests

def prepare_inputs_for_generation(
self,
Expand All @@ -1457,7 +1437,7 @@

# if `inputs_embeds` are passed, we only want to use them in the 1st generation step
if inputs_embeds is not None and past_key_values is None:
model_inputs = {"inputs_embeds": inputs_embeds}

Check warning on line 1440 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1440

Added line #L1440 was not covered by tests
else:
model_inputs = {"input_ids": input_ids}

Expand All @@ -1473,7 +1453,7 @@
return model_inputs

def _get_model_inputs_spec(self, dtype: str):
return {

Check warning on line 1456 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1456

Added line #L1456 was not covered by tests
"input_ids": paddle.static.InputSpec(shape=[None, None], dtype="int64"),
"attention_mask": paddle.static.InputSpec(shape=[None, None], dtype="int64"),
"position_ids": paddle.static.InputSpec(shape=[None, None], dtype="int64"),
Expand All @@ -1486,12 +1466,12 @@
model_kwargs["past_key_values"] = outputs[1]

if isinstance(outputs, MoECausalLMOutputWithPast) and "past_key_values" in outputs:
model_kwargs["past_key_values"] = outputs.past_key_values

Check warning on line 1469 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1469

Added line #L1469 was not covered by tests

# update position_ids
if "position_ids" in model_kwargs and model_kwargs["position_ids"] is not None:
position_ids = model_kwargs["position_ids"]
model_kwargs["position_ids"] = paddle.concat([position_ids, position_ids[..., -1:] + 1], axis=-1)

Check warning on line 1474 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1473-L1474

Added lines #L1473 - L1474 were not covered by tests

if not is_encoder_decoder and "attention_mask" in model_kwargs:
attention_mask = model_kwargs["attention_mask"]
Expand Down Expand Up @@ -1550,23 +1530,23 @@

loss = None
if labels is not None:
loss = self.criterion(logits, labels)

Check warning on line 1533 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1533

Added line #L1533 was not covered by tests

aux_loss = None
if output_router_logits:
aux_loss = load_balancing_loss_func(

Check warning on line 1537 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1537

Added line #L1537 was not covered by tests
outputs.router_logits if return_dict else outputs[-1],
self.num_experts,
self.num_experts_per_tok,
attention_mask,
)
if labels is not None:
loss += self.router_aux_loss_coef * aux_loss

Check warning on line 1544 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1543-L1544

Added lines #L1543 - L1544 were not covered by tests

if not return_dict:
output = (logits,) + outputs[1:]
if output_router_logits:
output = (aux_loss,) + output

Check warning on line 1549 in paddlenlp/transformers/qwen2moe/modeling.py

View check run for this annotation

Codecov / codecov/patch

paddlenlp/transformers/qwen2moe/modeling.py#L1549

Added line #L1549 was not covered by tests
return (loss,) + output if loss is not None else output

return MoECausalLMOutputWithPast(
Expand Down
Loading