Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: allow protobuf 3.20.3 requirement #22759

Merged
merged 6 commits into from
May 10, 2023

Conversation

jose-turintech
Copy link
Contributor

Allow latest bugfix release for protobuf (3.20.3)

What does this PR do?

Change in requirements for python library so it allows the use of latest bugfix release for protobuf (3.20.3) instead of restricting it to the upper bound limit for this dependency to 3.20.2 (<=3.20.2).

Allow latest bugfix release for protobuf (3.20.3)
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

update auto-generated dependency table
@amyeroberts
Copy link
Collaborator

cc @ydshieh

@ydshieh
Copy link
Collaborator

ydshieh commented Apr 14, 2023

Hi @jose-turintech Thank you for this PR ❤️ .

Unfortunately, as you can see in the CI results, the changes cause some errors

FAILED tests/models/albert/test_tokenization_albert.py::AlbertTokenizationTest::test_pickle_subword_regularization_tokenizer
FAILED tests/models/bert_generation/test_tokenization_bert_generation.py::BertGenerationTokenizationTest::test_pickle_subword_regularization_tokenizer

(Fatal Python error: Segmentation fault)

So we are not able to merge this PR so far. There might be some way to fix these 2 issues, but I am not sure. Let me know if you want to dive into this.

@adriangonz
Copy link

Hey @jose-turintech ,

Thanks for submitting this PR! The latest tensorflow==2.12 release depends on protobuf >= 3.20.3, so this would unblock installing the latest tensorflow alongside transformers.

After setting up this PR's environment, I just ran this locally and those tests seem to pass. Would it be possible to trigger a re-run @ydshieh? Alternatively, would it be possible to get extra logs on the CI failures?

@jose-turintech
Copy link
Contributor Author

jose-turintech commented May 8, 2023

Hey @jose-turintech ,

Thanks for submitting this PR! The latest tensorflow==2.12 release depends on protobuf >= 3.20.3, so this would unblock installing the latest tensorflow alongside transformers.

After setting up this PR's environment, I just ran this locally and those tests seem to pass. Would it be possible to trigger a re-run @ydshieh? Alternatively, would it be possible to get extra logs on the CI failures?

Hello @adriangonz, just updated my branch with latest changes on origin in order to test if the PR check would retrigger. It seems so; so i guess we'll see if the PR passes all check within some minutes.

Thanks for your comment.

@ydshieh ydshieh self-assigned this May 9, 2023
@ydshieh
Copy link
Collaborator

ydshieh commented May 9, 2023

As you can see in the latest run, the 2 failed tests are still there.

The error (provided at the end below) is some processes crashed, and there is no more informative log being given by the pytest.

When I run the two failed tests individually on my local env., they pass.

However, since the latest release of tensorflow-probaility broke everything in the CI since we don't support TensorFlow 2.12 yet and it needs that version, we will take a more deep look to see if we can unblock this situation.

=================================== FAILURES ===================================
______ tests/models/bert_generation/test_tokenization_bert_generation.py _______
[gw0] linux -- Python 3.8.12 /home/circleci/.pyenv/versions/3.8.12/bin/python
worker 'gw0' crashed while running 'tests/models/bert_generation/test_tokenization_bert_generation.py::BertGenerationTokenizationTest::test_pickle_subword_regularization_tokenizer'
_______________ tests/models/albert/test_tokenization_albert.py ________________
[gw6] linux -- Python 3.8.12 /home/circleci/.pyenv/versions/3.8.12/bin/python
worker 'gw6' crashed while running 'tests/models/albert/test_tokenization_albert.py::AlbertTokenizationTest::test_pickle_subword_regularization_tokenizer'

@jose-turintech
Copy link
Contributor Author

@ydshieh i've merged origin into this branch once again to retrigger CI checks just to see if test passed after the downtime of some huggingface transformers yesterday. Tests pass now after your modifications :) .

Only difference with main is the tensorflow-probaility>0.20 restriction is not applied in this CI build.

Thanks for taking the time to take a look into the issue.

@ydshieh
Copy link
Collaborator

ydshieh commented May 10, 2023

@jose-turintech I accidentally pushed the experimental changes to this PR branch, I am really sorry. The CI is green as I removed some 3rd packages (tensorflow for example), which it should be kept.

I am still looking how to resolve the issue. There is a big problem (at least the desired environment when running inside CircleCI runner). I will keep you update soon.

@ydshieh ydshieh marked this pull request as draft May 10, 2023 10:38
@ydshieh
Copy link
Collaborator

ydshieh commented May 10, 2023

In the meantime, let us convert this PR into a draft mode, so it won't be merged by accident. Thank you for your comprehension.

@ydshieh
Copy link
Collaborator

ydshieh commented May 10, 2023

Issue

(for the hack to fix, see the next reply)

This PR want to use protobuf==3.20.3 so we can use tensorflow==2.12. However, some tests like test_pickle_subword_regularization_tokenizer fails with this environment. The following describes the issue.

  • First, setup the env. to have tensorflow-cpu==2.12 with potobuf==3.20.3 but without torch installed.
  • Use a sentencepiece tokenizer with enable_sampling=True
  • run the code with pytest --> crash (core dump)
    • (run with a script --> no crash)
    • (run the code with pytest where torch is also there --> no crash)

Here are 2 code snippets to reproduce (and not): run in python 3.8 will more likely to reproduce


  • create tests/test_foo.py and run python -m pytest -v tests/test_foo.py --> crash
from transformers import AlbertTokenizer
def test_foo():

    fn = "tests/fixtures/spiece.model"
    text = "This is a test for subword regularization."

    # `encode` works with `False`
    sp_model_kwargs = {"enable_sampling": False}
    tokenizer = AlbertTokenizer(fn, sp_model_kwargs=sp_model_kwargs)

    # test 1 (usage in `transformers`)
    # _ = tokenizer.tokenize(text)

    # test 2 (direct use in sentencepiece)
    pieces = tokenizer.sp_model.encode(text, out_type=str)

    # `encode` fails with `True` if torch is not installed and tf==2.12
    sp_model_kwargs = {"enable_sampling": True}
    tokenizer = AlbertTokenizer(fn, sp_model_kwargs=sp_model_kwargs)

    # test 1 (usage in `transformers`)
    # _ = tokenizer.tokenize(text)

    # This gives `Segmentation fault (core dumped)` under the above mentioned conditions
    # test 2 (direct use in sentencepiece)
    pieces = tokenizer.sp_model.encode(text, out_type=str)
    print(pieces)
  • create foo.py and run python foo.py -> no crash
from transformers import AlbertTokenizer

fn = "tests/fixtures/spiece.model"
text = "This is a test for subword regularization."

# `encode` works with `False`
sp_model_kwargs = {"enable_sampling": False}
tokenizer = AlbertTokenizer(fn, sp_model_kwargs=sp_model_kwargs)

# test 1 (usage in `transformers`)
# _ = tokenizer.tokenize(text)

# test 2 (direct use in sentencepiece)
pieces = tokenizer.sp_model.encode(text, out_type=str)

# `encode` works with `True` if torch is not installed and tf==2.12
sp_model_kwargs = {"enable_sampling": True}
tokenizer = AlbertTokenizer(fn, sp_model_kwargs=sp_model_kwargs)

# test 1 (usage in `transformers`)
# _ = tokenizer.tokenize(text)

# This works
# test 2 (direct use in sentencepiece)
pieces = tokenizer.sp_model.encode(text, out_type=str)
print(pieces)

@ydshieh
Copy link
Collaborator

ydshieh commented May 10, 2023

(Hacky) Fix

The relevant failing tests are:

  • test_subword_regularization_tokenizer
  • test_pickle_subword_regularization_tokenizer

As mentioned above, those failing tests only happen when running with pytest. Furthermore, those test don't actually need protobuf in order to run. However, in the TF CI job, protobuf is a dependency from TensorFlow.

It turns out that running those 2 problematic tests in a subprocess will avoid the crash. It's unclear what actually happens though.

This PR modify those 2 tests to be run in a subprocess, so we can have protobuf==3.20.3 along with tensroflow==2.12.

@ydshieh
Copy link
Collaborator

ydshieh commented May 10, 2023

The TF job is successful with the last fix, see this job run

https://app.circleci.com/pipelines/github/huggingface/transformers/64174/workflows/688a1174-8a6f-4599-9479-f51bbc2aacdb/jobs/793536/artifacts

The other jobs should be fine (we will see in 30 minutes) as they already pass before.

@ydshieh ydshieh marked this pull request as ready for review May 10, 2023 14:57
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Run 2 tests in a subprocess to avoid problematic core dump

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing and all the work getting the tests to work, including all of the details you provided, @ydshieh!

I just some comments on the assets and a question about _test_pickle_subword_regularization_tokenizer implementation

tests/test_tokenization_common.py Outdated Show resolved Hide resolved
tests/test_tokenization_common.py Outdated Show resolved Hide resolved
tests/test_tokenization_common.py Outdated Show resolved Hide resolved
ydshieh and others added 2 commits May 10, 2023 18:30
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a ton for taking the time to debug this!

Copy link
Collaborator

@ydshieh ydshieh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you again @jose-turintech for this PR, which allows to use TF 2.12!

@ydshieh ydshieh merged commit 0c65fb7 into huggingface:main May 10, 2023
@jose-turintech
Copy link
Contributor Author

Thank you again @jose-turintech for this PR, which allows to use TF 2.12!

Thank you very much for taking the time to fix the issues, you did all the work; really appreciate it.

gojiteji pushed a commit to gojiteji/transformers that referenced this pull request Jun 5, 2023
* chore: allow protobuf 3.20.3

Allow latest bugfix release for protobuf (3.20.3)

* chore: update auto-generated dependency table

update auto-generated dependency table

* run in subprocess

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
novice03 pushed a commit to novice03/transformers that referenced this pull request Jun 23, 2023
* chore: allow protobuf 3.20.3

Allow latest bugfix release for protobuf (3.20.3)

* chore: update auto-generated dependency table

update auto-generated dependency table

* run in subprocess

* Apply suggestions from code review

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Apply suggestions

---------

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants