Unimplemented error when using AdamWeightDecay in TF #20847

ZJaume · 2022-12-20T12:32:41Z

System Info

transformers version: 4.26.0.dev0
Platform: Linux-4.15.0-200-generic-x86_64-with-glibc2.17
Python version: 3.8.13
Huggingface_hub version: 0.11.1
PyTorch version (GPU?): 1.10.1+cu102 (True)
Tensorflow version (GPU?): 2.11.0 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: No

Who can help?

@Rocketknight1

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Coming from here: #20750. Using the example code but with AdamWeightDecay triggers the error.

The code:

from transformers import TFAutoModelForSequenceClassification
from transformers.optimization_tf import create_optimizer
from transformers import AutoTokenizer
from tensorflow.keras.optimizers import Adam
from datasets import load_dataset
import tensorflow as tf
import numpy as np

dataset = load_dataset("glue", "cola")
dataset = dataset["train"]  # Just take the training split for now


tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
tokenized_data = dict(tokenizer(dataset["sentence"], return_tensors="np", padding=True))

labels = np.array(dataset["label"])  # Label is already an array of 0 and 1

# Load and compile our model
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased")
# Lower learning rates are often better for fine-tuning transformers
optimizer, _ = create_optimizer(3e-5, 600, 100, weight_decay_rate=0.3)
model.compile(optimizer=optimizer, loss='binary_crossentropy')

model.fit(tokenized_data, labels)

Traceback (most recent call last):
  File "../test_mirrored.py", line 24, in <module>
    model.fit(tokenized_data, labels)
  File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:

Detected at node 'Cast_1' defined at (most recent call last):
    File "../test_mirrored.py", line 24, in <module>
      model.fit(tokenized_data, labels)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1650, in fit
      tmp_logs = self.train_function(iterator)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1249, in train_function
      return step_function(self, iterator)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1233, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1222, in run_step
      outputs = model.train_step(data)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/transformers/modeling_tf_utils.py", line 1559, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
      self.apply_gradients(grads_and_vars)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/transformers/optimization_tf.py", line 252, in apply_gradients
      return super(AdamWeightDecay, self).apply_gradients(zip(grads, tvars), name=name, **kwargs)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 632, in apply_gradients
      self._apply_weight_decay(trainable_variables)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1159, in _apply_weight_decay
      tf.__internal__.distribute.interim.maybe_merge_call(
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1155, in distributed_apply_weight_decay
      distribution.extended.update(
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1151, in weight_decay_fn
      wd = tf.cast(self.weight_decay, variable.dtype)
Node: 'Cast_1'
2 root error(s) found.
  (0) UNIMPLEMENTED:  Cast string to float is not supported
         [[{{node Cast_1}}]]
  (1) CANCELLED:  Function was cancelled before it was started
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_37329]

Setting weight decay to 0.0 does not trigger the error, so I imagine its something with AdamWeightDecay. TensorFlow changelog says:

The tf.keras.optimizers.Optimizer base class now points to the new Keras optimizer, while the old optimizers have been moved to the tf.keras.optimizers.legacy namespace.

and

Checkpoint loading failure. The new optimizer handles optimizer state differently from the old optimizer, which simplifies the logic of checkpoint saving/loading, but at the cost of breaking checkpoint backward compatibility in some cases. If you want to keep using an old checkpoint, please change your optimizer to tf.keras.optimizer.legacy.XXX (e.g. tf.keras.optimizer.legacy.Adam).
Old optimizer API not found. The new optimizer, tf.keras.optimizers.Optimizer, has a different set of public APIs from the old optimizer. These API changes are mostly related to getting rid of slot variables and TF1 support. Please check the API documentation to find alternatives to the missing API. If you must call the deprecated API, please change your optimizer to the legacy optimizer.

Could it be related to this?

Expected behavior

Train successfully.

The text was updated successfully, but these errors were encountered:

Rocketknight1 · 2022-12-20T12:39:41Z

Hi @ZJaume, we saw this issue earlier but thought we had fixed it with #20735. I'll investigate now and see if I can reproduce it

Rocketknight1 · 2022-12-20T13:09:23Z

Reproduced. The cause was a typo that's also present in the TF Changelog for 2.11, will push a PR now!

Rocketknight1 · 2022-12-20T13:11:08Z

PR is up at #20848

Rocketknight1 · 2022-12-20T13:41:06Z

@ZJaume Should be fixed now, thanks for the bug report! Let me know if installing the latest version from main doesn't fix your problem.

ZJaume · 2022-12-20T13:47:36Z

Working. Thank you!

Rocketknight1 mentioned this issue Dec 20, 2022

TF AdamWeightDecay fix for 2.11 #20848

Merged

Rocketknight1 closed this as completed in #20848 Dec 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unimplemented error when using AdamWeightDecay in TF #20847

Unimplemented error when using AdamWeightDecay in TF #20847

ZJaume commented Dec 20, 2022

Rocketknight1 commented Dec 20, 2022

Rocketknight1 commented Dec 20, 2022

Rocketknight1 commented Dec 20, 2022

Rocketknight1 commented Dec 20, 2022

ZJaume commented Dec 20, 2022

Unimplemented error when using AdamWeightDecay in TF #20847

Unimplemented error when using AdamWeightDecay in TF #20847

Comments

ZJaume commented Dec 20, 2022

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Rocketknight1 commented Dec 20, 2022

Rocketknight1 commented Dec 20, 2022

Rocketknight1 commented Dec 20, 2022

Rocketknight1 commented Dec 20, 2022

ZJaume commented Dec 20, 2022