Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unimplemented error when using AdamWeightDecay in TF #20847

Closed
3 of 4 tasks
ZJaume opened this issue Dec 20, 2022 · 5 comments · Fixed by #20848
Closed
3 of 4 tasks

Unimplemented error when using AdamWeightDecay in TF #20847

ZJaume opened this issue Dec 20, 2022 · 5 comments · Fixed by #20848

Comments

@ZJaume
Copy link

ZJaume commented Dec 20, 2022

System Info

  • transformers version: 4.26.0.dev0
  • Platform: Linux-4.15.0-200-generic-x86_64-with-glibc2.17
  • Python version: 3.8.13
  • Huggingface_hub version: 0.11.1
  • PyTorch version (GPU?): 1.10.1+cu102 (True)
  • Tensorflow version (GPU?): 2.11.0 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?: Yes
  • Using distributed or parallel set-up in script?: No

Who can help?

@Rocketknight1

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Coming from here: #20750. Using the example code but with AdamWeightDecay triggers the error.

The code:

from transformers import TFAutoModelForSequenceClassification
from transformers.optimization_tf import create_optimizer
from transformers import AutoTokenizer
from tensorflow.keras.optimizers import Adam
from datasets import load_dataset
import tensorflow as tf
import numpy as np

dataset = load_dataset("glue", "cola")
dataset = dataset["train"]  # Just take the training split for now


tokenizer = AutoTokenizer.from_pretrained("bert-base-cased")
tokenized_data = dict(tokenizer(dataset["sentence"], return_tensors="np", padding=True))

labels = np.array(dataset["label"])  # Label is already an array of 0 and 1

# Load and compile our model
model = TFAutoModelForSequenceClassification.from_pretrained("bert-base-cased")
# Lower learning rates are often better for fine-tuning transformers
optimizer, _ = create_optimizer(3e-5, 600, 100, weight_decay_rate=0.3)
model.compile(optimizer=optimizer, loss='binary_crossentropy')

model.fit(tokenized_data, labels)
Traceback (most recent call last):
  File "../test_mirrored.py", line 24, in <module>
    model.fit(tokenized_data, labels)
  File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 70, in error_handler
    raise e.with_traceback(filtered_tb) from None
  File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/tensorflow/python/eager/execute.py", line 52, in quick_execute
    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
tensorflow.python.framework.errors_impl.UnimplementedError: Graph execution error:

Detected at node 'Cast_1' defined at (most recent call last):
    File "../test_mirrored.py", line 24, in <module>
      model.fit(tokenized_data, labels)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 65, in error_handler
      return fn(*args, **kwargs)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1650, in fit
      tmp_logs = self.train_function(iterator)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1249, in train_function
      return step_function(self, iterator)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1233, in step_function
      outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/engine/training.py", line 1222, in run_step
      outputs = model.train_step(data)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/transformers/modeling_tf_utils.py", line 1559, in train_step
      self.optimizer.minimize(loss, self.trainable_variables, tape=tape)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 527, in minimize
      self.apply_gradients(grads_and_vars)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/transformers/optimization_tf.py", line 252, in apply_gradients
      return super(AdamWeightDecay, self).apply_gradients(zip(grads, tvars), name=name, **kwargs)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1140, in apply_gradients
      return super().apply_gradients(grads_and_vars, name=name)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 632, in apply_gradients
      self._apply_weight_decay(trainable_variables)
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1159, in _apply_weight_decay
      tf.__internal__.distribute.interim.maybe_merge_call(
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1155, in distributed_apply_weight_decay
      distribution.extended.update(
    File "/home/user/bicleaner-ai-trainings/venv/lib/python3.8/site-packages/keras/optimizers/optimizer_experimental/optimizer.py", line 1151, in weight_decay_fn
      wd = tf.cast(self.weight_decay, variable.dtype)
Node: 'Cast_1'
2 root error(s) found.
  (0) UNIMPLEMENTED:  Cast string to float is not supported
         [[{{node Cast_1}}]]
  (1) CANCELLED:  Function was cancelled before it was started
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_37329]

Setting weight decay to 0.0 does not trigger the error, so I imagine its something with AdamWeightDecay. TensorFlow changelog says:

The tf.keras.optimizers.Optimizer base class now points to the new Keras optimizer, while the old optimizers have been moved to the tf.keras.optimizers.legacy namespace.

and

Checkpoint loading failure. The new optimizer handles optimizer state differently from the old optimizer, which simplifies the logic of checkpoint saving/loading, but at the cost of breaking checkpoint backward compatibility in some cases. If you want to keep using an old checkpoint, please change your optimizer to tf.keras.optimizer.legacy.XXX (e.g. tf.keras.optimizer.legacy.Adam).
Old optimizer API not found. The new optimizer, tf.keras.optimizers.Optimizer, has a different set of public APIs from the old optimizer. These API changes are mostly related to getting rid of slot variables and TF1 support. Please check the API documentation to find alternatives to the missing API. If you must call the deprecated API, please change your optimizer to the legacy optimizer.

Could it be related to this?

Expected behavior

Train successfully.

@Rocketknight1
Copy link
Member

Hi @ZJaume, we saw this issue earlier but thought we had fixed it with #20735. I'll investigate now and see if I can reproduce it

@Rocketknight1
Copy link
Member

Reproduced. The cause was a typo that's also present in the TF Changelog for 2.11, will push a PR now!

@Rocketknight1
Copy link
Member

PR is up at #20848

@Rocketknight1
Copy link
Member

@ZJaume Should be fixed now, thanks for the bug report! Let me know if installing the latest version from main doesn't fix your problem.

@ZJaume
Copy link
Author

ZJaume commented Dec 20, 2022

Working. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants