Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable clobbering of implementations by default #1237

Merged
merged 1 commit into from
Apr 4, 2023

Conversation

yarikoptic
Copy link
Contributor

I think nothing good can come out from clobbing by default. The necessity to clobber=True in process_entries was not well understood by me, but seems to stem from some artificial situation causing re-import of the fsspec

Follow up to #1227 harmonizing default behavior to have clobber=False in both similar functionalities.

I think nothing good can come out from clobbing by default.
The necessity to clobber=True  in process_entries  was not
well understood by me, but seems to stem from some artificial
situation causing re-import of the fsspec
@martindurant
Copy link
Member

seems to stem from some artificial situation causing re-import of the fsspec

Do you have a reference for this?

@yarikoptic
Copy link
Contributor Author

yarikoptic commented Apr 4, 2023

seems to stem from some artificial situation causing re-import of the fsspec

Do you have a reference for this?

https://github.com/fsspec/filesystem_spec/blob/HEAD/fsspec/tests/test_registry.py#L102 the test_filesystem_warning_arrow_hdfs_deprecated test which forces re-registration of the mocked arrow_hdfs . Not even sure if it would be 'safe' , e.g. would alter global state so if testing run within a process which later uses fsspec - you would end up with a mocked instance there in the global registry. edit: or to introduce side effect as if have some other tests later on needed proper arrow_hdfs implementation

@martindurant
Copy link
Member

The clear_registry fixture should deal with that. So this test would fail if we blanket make clobber False for registering? To be clear, this is not a reimport, but evaluation of an EntryPoint (which can cause import, but doesn't need to).

In short, I think everything is fine, unless you have other concerns.

@pedrorivotti
Copy link

@martindurant, following this change, would it be expected for fsspec.open() to fail when called from multiple threads?

For example:

import concurrent.futures
import fsspec  # 2023.4.0


def fun(path):
    try:
        print(path)
        with fsspec.open(path) as f:
            result = f.read()
    except FileNotFoundError:
        return "OK"  # This error is expected


with concurrent.futures.ThreadPoolExecutor() as executor:
    futures = [
        executor.submit(fun, path="/path/dummy1"),
        executor.submit(fun, path="/path/dummy2"),
    ]
    for future in concurrent.futures.as_completed(futures):
        print(future.result())

Results in the following error:

ValueError: Name (file) already in the registry and clobber is False

While on 2023.3.0 it would print OK twice.

A similar example could be created using, for example, s3fs.

@martindurant
Copy link
Member

This should NOT happen. It seems that register is being called twice, perhaps because two instances are being created simultaneously. I bet if you run fun() once before submitting to the thread pool, it works OK.

@pedrorivotti
Copy link

I bet if you run fun() once before submitting to the thread pool, it works OK.

Correct 🙂

@martindurant
Copy link
Member

Maybe we should guard against this anyway, since doing open simultaneously in threads is a reasonable workflow, especially for dask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants