Unpin fsspec #5733

albertvillanova · 2023-04-11T08:52:12Z

In fsspec--2023.4.0 default value for clobber when registering an implementation was changed from True to False. See:

Disable clobbering of implementations by default fsspec/filesystem_spec#1237

This PR recovers previous behavior by passing clobber True when registering mock implementations.

This PR also removes the temporary pin introduced by:

Temporarily pin fsspec #5731

Fix #5734.

…-implementation

HuggingFaceDocBuilderDev · 2023-04-11T08:58:15Z

The documentation is not available anymore as the PR was closed or merged.

lhoestq

Thanks !

github-actions · 2023-04-11T11:11:44Z

Show benchmarks

PyArrow==8.0.0

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006240 / 0.011353 (-0.005113)	0.004392 / 0.011008 (-0.006616)	0.097276 / 0.038508 (0.058768)	0.027262 / 0.023109 (0.004153)	0.303203 / 0.275898 (0.027305)	0.331878 / 0.323480 (0.008398)	0.004706 / 0.007986 (-0.003279)	0.004428 / 0.004328 (0.000100)	0.074666 / 0.004250 (0.070416)	0.036154 / 0.037052 (-0.000899)	0.302997 / 0.258489 (0.044508)	0.340350 / 0.293841 (0.046509)	0.031011 / 0.128546 (-0.097535)	0.011616 / 0.075646 (-0.064031)	0.323671 / 0.419271 (-0.095601)	0.042062 / 0.043533 (-0.001471)	0.311381 / 0.255139 (0.056242)	0.324697 / 0.283200 (0.041498)	0.084248 / 0.141683 (-0.057435)	1.471651 / 1.452155 (0.019496)	1.533414 / 1.492716 (0.040697)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.193555 / 0.018006 (0.175549)	0.393452 / 0.000490 (0.392962)	0.002348 / 0.000200 (0.002148)	0.000071 / 0.000054 (0.000016)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.022523 / 0.037411 (-0.014889)	0.096552 / 0.014526 (0.082026)	0.101746 / 0.176557 (-0.074810)	0.163145 / 0.737135 (-0.573990)	0.106417 / 0.296338 (-0.189921)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.448589 / 0.215209 (0.233380)	4.467803 / 2.077655 (2.390148)	2.178745 / 1.504120 (0.674625)	1.983339 / 1.541195 (0.442145)	2.056554 / 1.468490 (0.588064)	0.697571 / 4.584777 (-3.887206)	3.363967 / 3.745712 (-0.381745)	1.872526 / 5.269862 (-3.397336)	1.258245 / 4.565676 (-3.307432)	0.082954 / 0.424275 (-0.341321)	0.012306 / 0.007607 (0.004699)	0.545096 / 0.226044 (0.319052)	5.468706 / 2.268929 (3.199777)	2.645333 / 55.444624 (-52.799292)	2.287659 / 6.876477 (-4.588818)	2.346768 / 2.142072 (0.204696)	0.803730 / 4.805227 (-4.001497)	0.151037 / 6.500664 (-6.349627)	0.066404 / 0.075469 (-0.009065)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.192982 / 1.841788 (-0.648806)	13.631225 / 8.074308 (5.556917)	13.830053 / 10.191392 (3.638661)	0.141901 / 0.680424 (-0.538523)	0.016500 / 0.534201 (-0.517701)	0.373268 / 0.579283 (-0.206015)	0.380123 / 0.434364 (-0.054241)	0.430786 / 0.540337 (-0.109551)	0.512669 / 1.386936 (-0.874267)

PyArrow==latest

Show updated benchmarks!

Benchmark: benchmark_array_xd.json

metric	read_batch_formatted_as_numpy after write_array2d	read_batch_formatted_as_numpy after write_flattened_sequence	read_batch_formatted_as_numpy after write_nested_sequence	read_batch_unformated after write_array2d	read_batch_unformated after write_flattened_sequence	read_batch_unformated after write_nested_sequence	read_col_formatted_as_numpy after write_array2d	read_col_formatted_as_numpy after write_flattened_sequence	read_col_formatted_as_numpy after write_nested_sequence	read_col_unformated after write_array2d	read_col_unformated after write_flattened_sequence	read_col_unformated after write_nested_sequence	read_formatted_as_numpy after write_array2d	read_formatted_as_numpy after write_flattened_sequence	read_formatted_as_numpy after write_nested_sequence	read_unformated after write_array2d	read_unformated after write_flattened_sequence	read_unformated after write_nested_sequence	write_array2d	write_flattened_sequence	write_nested_sequence
new / old (diff)	0.006161 / 0.011353 (-0.005192)	0.004399 / 0.011008 (-0.006609)	0.076210 / 0.038508 (0.037702)	0.026791 / 0.023109 (0.003681)	0.341523 / 0.275898 (0.065625)	0.370400 / 0.323480 (0.046920)	0.004495 / 0.007986 (-0.003491)	0.003204 / 0.004328 (-0.001125)	0.075444 / 0.004250 (0.071194)	0.035914 / 0.037052 (-0.001138)	0.343806 / 0.258489 (0.085317)	0.384320 / 0.293841 (0.090479)	0.031438 / 0.128546 (-0.097109)	0.011253 / 0.075646 (-0.064393)	0.085364 / 0.419271 (-0.333908)	0.041407 / 0.043533 (-0.002126)	0.338831 / 0.255139 (0.083692)	0.364357 / 0.283200 (0.081158)	0.087417 / 0.141683 (-0.054266)	1.520624 / 1.452155 (0.068470)	1.572432 / 1.492716 (0.079716)

Benchmark: benchmark_getitem_100B.json

metric	get_batch_of_1024_random_rows	get_batch_of_1024_rows	get_first_row	get_last_row
new / old (diff)	0.232403 / 0.018006 (0.214396)	0.388187 / 0.000490 (0.387698)	0.001158 / 0.000200 (0.000958)	0.000069 / 0.000054 (0.000014)

Benchmark: benchmark_indices_mapping.json

metric	select	shard	shuffle	sort	train_test_split
new / old (diff)	0.024596 / 0.037411 (-0.012816)	0.101203 / 0.014526 (0.086677)	0.105243 / 0.176557 (-0.071314)	0.158215 / 0.737135 (-0.578920)	0.110277 / 0.296338 (-0.186061)

Benchmark: benchmark_iterating.json

metric	read 5000	read 50000	read_batch 50000 10	read_batch 50000 100	read_batch 50000 1000	read_formatted numpy 5000	read_formatted pandas 5000	read_formatted tensorflow 5000	read_formatted torch 5000	read_formatted_batch numpy 5000 10	read_formatted_batch numpy 5000 1000	shuffled read 5000	shuffled read 50000	shuffled read_batch 50000 10	shuffled read_batch 50000 100	shuffled read_batch 50000 1000	shuffled read_formatted numpy 5000	shuffled read_formatted_batch numpy 5000 10	shuffled read_formatted_batch numpy 5000 1000
new / old (diff)	0.435661 / 0.215209 (0.220452)	4.350151 / 2.077655 (2.272496)	2.072372 / 1.504120 (0.568252)	1.870675 / 1.541195 (0.329480)	1.910883 / 1.468490 (0.442393)	0.697384 / 4.584777 (-3.887393)	3.399377 / 3.745712 (-0.346335)	2.685008 / 5.269862 (-2.584854)	1.476843 / 4.565676 (-3.088834)	0.083177 / 0.424275 (-0.341098)	0.012413 / 0.007607 (0.004806)	0.542543 / 0.226044 (0.316498)	5.431422 / 2.268929 (3.162494)	2.506419 / 55.444624 (-52.938206)	2.166342 / 6.876477 (-4.710135)	2.164421 / 2.142072 (0.022348)	0.800609 / 4.805227 (-4.004618)	0.150527 / 6.500664 (-6.350137)	0.065780 / 0.075469 (-0.009689)

Benchmark: benchmark_map_filter.json

metric	filter	map fast-tokenizer batched	map identity	map identity batched	map no-op batched	map no-op batched numpy	map no-op batched pandas	map no-op batched pytorch	map no-op batched tensorflow
new / old (diff)	1.293409 / 1.841788 (-0.548379)	13.814898 / 8.074308 (5.740590)	13.940416 / 10.191392 (3.749024)	0.149377 / 0.680424 (-0.531047)	0.016462 / 0.534201 (-0.517739)	0.393748 / 0.579283 (-0.185535)	0.384327 / 0.434364 (-0.050037)	0.489900 / 0.540337 (-0.050437)	0.574608 / 1.386936 (-0.812328)

albertvillanova added 3 commits April 11, 2023 10:48

Pass clobber True

c97a503

Merge remote-tracking branch 'upstream/main' into fix-fsspec-register…

1b6ddd9

…-implementation

Unpin fsspec

fda7ed3

lhoestq approved these changes Apr 11, 2023

View reviewed changes

albertvillanova merged commit f260793 into huggingface:main Apr 11, 2023

albertvillanova deleted the fix-fsspec-register-implementation branch April 11, 2023 11:04

albertvillanova mentioned this pull request Apr 12, 2023

Fix CI mock filesystem fixtures #5740

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unpin fsspec #5733

Unpin fsspec #5733

albertvillanova commented Apr 11, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 11, 2023 •

edited

Loading

lhoestq left a comment

github-actions bot commented Apr 11, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Unpin fsspec #5733

Unpin fsspec #5733

Conversation

albertvillanova commented Apr 11, 2023 • edited Loading

HuggingFaceDocBuilderDev commented Apr 11, 2023 • edited Loading

lhoestq left a comment

Choose a reason for hiding this comment

github-actions bot commented Apr 11, 2023

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

Benchmark: benchmark_array_xd.json

Benchmark: benchmark_getitem_100B.json

Benchmark: benchmark_indices_mapping.json

Benchmark: benchmark_iterating.json

Benchmark: benchmark_map_filter.json

albertvillanova commented Apr 11, 2023 •

edited

Loading

HuggingFaceDocBuilderDev commented Apr 11, 2023 •

edited

Loading