Use nosync policy in gather and scatter implementations. #12038

bdice · 2022-11-01T17:15:53Z

Description

This PR uses rmm::exec_policy_nosync in libcudf's gather and scatter functions. These changes are motivated by performance improvements seen previously in #11577.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

codecov · 2022-11-01T20:37:26Z

Codecov Report

Base: 88.09% // Head: 88.12% // Increases project coverage by +0.03% 🎉

Coverage data is based on head (a14ae56) compared to base (03034af).
Patch has no changes to coverable lines.

Additional details and impacted files

@@               Coverage Diff                @@
##           branch-22.12   #12038      +/-   ##
================================================
+ Coverage         88.09%   88.12%   +0.03%     
================================================
  Files               133      133              
  Lines             22003    22003              
================================================
+ Hits              19383    19390       +7     
+ Misses             2620     2613       -7

Impacted Files	Coverage Δ
python/cudf/cudf/core/dataframe.py	`93.67% <0.00%> (+0.04%)`	⬆️
python/cudf/cudf/core/column/string.py	`88.65% <0.00%> (+0.12%)`	⬆️
python/cudf/cudf/core/groupby/groupby.py	`91.51% <0.00%> (+0.20%)`	⬆️
python/cudf/cudf/core/tools/datetimes.py	`84.49% <0.00%> (+0.30%)`	⬆️
python/cudf/cudf/core/column/lists.py	`93.75% <0.00%> (+0.96%)`	⬆️
python/strings_udf/strings_udf/__init__.py	`86.27% <0.00%> (+1.96%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

davidwendt · 2022-11-01T20:54:04Z

cpp/include/cudf/strings/detail/gather.cuh

@@ -318,7 +318,7 @@ std::unique_ptr<cudf::column> gather(

  // check total size is not too large
  size_t const total_bytes = thrust::transform_reduce(
-    rmm::exec_policy(stream),
+    rmm::exec_policy_nosync(stream),


We need the result from this returned so this will require a sync inside reduce.

Thrust handles that for us. If a sync is required for the algorithm's return value (or some other part of its correctness), Thrust is responsible for the sync regardless of the execution policy. nosync really means "avoid syncing if possible."

Reference: https://github.com/NVIDIA/thrust/releases/tag/1.16.0

par_nosync is a hint to the Thrust execution engine that any non-essential internal synchronizations should be skipped and that an explicit synchronization will be performed by the caller before accessing results.

(The return value of a reduction is considered an essential synchronization.)

I'm thinking a comment may help here?
I'm worried for the future generations (including myself) who see this (or forgot this).

I think there are a couple underlying issues here and a couple of approaches to address them. First, par_nosync is pretty new, so not all Thrust developers have begun to use it and know its conventions. I think that once developers know the conventions and recognize that nosync is a safe choice in many cases in libcudf (but not all cases!), this will not be a point of confusion. I plan to make nosync changes across the entire libcudf codebase over time, so I am unsure if a code comment in every location is appropriate. There are around 60 instances of thrust::reduce alone -- and quite a few other algorithms fall into the same boat of mandating a final synchronization for host value return. Instead, I would propose expanding our developer docs on stream synchronization to explain when nosync is (or is not) appropriate.

Ok. I suppose looking at this code it certainly it appears that an internal synch must be occurring otherwise the reduce would not return the correct result.

Instead, I would propose expanding our developer docs on stream synchronization to explain when nosync is (or is not) appropriate.

Maybe a better code talk as well? :)
The explanation in the release notes is not very detailed. Does this mean that it will synchronize only when returning results on the host?

Yup! That's a great idea. I have seen a better explanation than in those release notes someplace (perhaps the PR where nosync was introduced?) but I didn't find it last time I looked. I'll sign up for a future Better Code talk.

@vuule November 30. Mark your calendar. 😉

…ter-nosync

bdice · 2022-11-04T21:49:48Z

Benchmarks

Comparing this PR (a14ae56) to branch-22.12 (2a58ff6). Broadly, both gather and scatter show significant performance improvements, on the order of 40-50% faster for 1024 rows, 5-10% faster for 1M rows, and no change for very large data sizes (the sync penalty is much smaller relative to the kernel runtime).

Benchmark                                                          Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------
Gather/double_coalesce_x/1024/1/manual_time                     -0.4483         -0.1178          7418          4092         23550         20776
Gather/double_coalesce_x/2048/1/manual_time                     -0.4081         -0.1001          7440          4404         23649         21281
Gather/double_coalesce_x/4096/1/manual_time                     -0.3873         -0.1016          7438          4557         23500         21113
Gather/double_coalesce_x/8192/1/manual_time                     -0.4488         -0.1526          8534          4704         24767         20988
Gather/double_coalesce_x/16384/1/manual_time                    -0.4211         -0.1235          8546          4947         24273         21276
Gather/double_coalesce_x/32768/1/manual_time                    -0.4065         -0.1242          8763          5201         24295         21278
Gather/double_coalesce_x/65536/1/manual_time                    -0.3704         -0.1203          9646          6073         24744         21768
Gather/double_coalesce_x/131072/1/manual_time                   -0.3077         -0.1159         11591          8024         25864         22866
Gather/double_coalesce_x/262144/1/manual_time                   -0.2155         -0.1040         15952         12514         29154         26122
Gather/double_coalesce_x/524288/1/manual_time                   -0.1630         -0.0939         22854         19129         35489         32155
Gather/double_coalesce_x/1048576/1/manual_time                  -0.0943         -0.0542         37773         34211         53389         50496
Gather/double_coalesce_x/2097152/1/manual_time                  -0.0511         -0.0338         68450         64951         84185         81340
Gather/double_coalesce_x/4194304/1/manual_time                  -0.0295         -0.0219        130052        126220        146025        142821
Gather/double_coalesce_x/8388608/1/manual_time                  -0.0142         -0.0119        252618        249032        268770        265580
Gather/double_coalesce_x/16777216/1/manual_time                 -0.0080         -0.0064        497949        493946        513916        510628
Gather/double_coalesce_x/33554432/1/manual_time                 -0.0040         -0.0031        988208        984283       1004155       1001034
Gather/double_coalesce_x/67108864/1/manual_time                 -0.0026         -0.0021       1968783       1963754       1984773       1980506
Gather/double_coalesce_x/1024/2/manual_time                     -0.4833         -0.2107         13889          7177         29756         23487
Gather/double_coalesce_x/2048/2/manual_time                     -0.4627         -0.2032         13876          7456         29659         23631
Gather/double_coalesce_x/4096/2/manual_time                     -0.4643         -0.2091         14351          7688         30052         23769
Gather/double_coalesce_x/8192/2/manual_time                     -0.4872         -0.2296         15549          7973         31131         23984
Gather/double_coalesce_x/16384/2/manual_time                    -0.4801         -0.2323         15752          8190         31131         23899
Gather/double_coalesce_x/32768/2/manual_time                    -0.4680         -0.2298         16171          8603         31082         23938
Gather/double_coalesce_x/65536/2/manual_time                    -0.4228         -0.2239         17474         10087         31782         24666
Gather/double_coalesce_x/131072/2/manual_time                   -0.3617         -0.2091         20581         13138         33885         26798
Gather/double_coalesce_x/262144/2/manual_time                   -0.2669         -0.1701         29566         21674         41938         34803
Gather/double_coalesce_x/524288/2/manual_time                   -0.1736         -0.1221         45076         37251         60578         53183
Gather/double_coalesce_x/1048576/2/manual_time                  -0.1031         -0.0808         76171         68314         91850         84429
Gather/double_coalesce_x/2097152/2/manual_time                  -0.0599         -0.0516        137928        129664        153855        145923
Gather/double_coalesce_x/4194304/2/manual_time                  -0.0291         -0.0257        260584        253003        276576        269474
Gather/double_coalesce_x/8388608/2/manual_time                  -0.0161         -0.0146        505886        497717        521883        514252
Gather/double_coalesce_x/16777216/2/manual_time                 -0.0076         -0.0067        996290        988763       1012273       1005524
Gather/double_coalesce_x/33554432/2/manual_time                 -0.0047         -0.0044       1977190       1967968       1993260       1984518
Gather/double_coalesce_x/67108864/2/manual_time                 -0.0029         -0.0021       3939077       3927767       3955071       3946719
Gather/double_coalesce_x/1024/4/manual_time                     -0.5247         -0.3216         26582         12634         42421         28780
Gather/double_coalesce_x/2048/4/manual_time                     -0.5237         -0.3244         26984         12852         42742         28875
Gather/double_coalesce_x/4096/4/manual_time                     -0.5429         -0.3463         29012         13261         44645         29185
Gather/double_coalesce_x/8192/4/manual_time                     -0.5378         -0.3474         29294         13541         44739         29199
Gather/double_coalesce_x/16384/4/manual_time                    -0.5291         -0.3446         29482         13882         44466         29144
Gather/double_coalesce_x/32768/4/manual_time                    -0.5179         -0.3456         30480         14694         44753         29287
Gather/double_coalesce_x/65536/4/manual_time                    -0.4843         -0.3374         32947         16991         46502         30810
Gather/double_coalesce_x/131072/4/manual_time                   -0.4145         -0.2978         38097         22306         50323         35335
Gather/double_coalesce_x/262144/4/manual_time                   -0.2835         -0.2165         57240         41015         72834         57062
Gather/double_coalesce_x/524288/4/manual_time                   -0.1809         -0.1496         90283         73947        105843         90008
Gather/double_coalesce_x/1048576/4/manual_time                  -0.1059         -0.0937        152031        135930        167809        152086
Gather/double_coalesce_x/2097152/4/manual_time                  -0.0624         -0.0573        277173        259869        293250        276456
Gather/double_coalesce_x/4194304/4/manual_time                  -0.0309         -0.0287        521579        505439        537641        522195
Gather/double_coalesce_x/8388608/4/manual_time                  -0.0167         -0.0159       1012186        995273       1028250       1011857
Gather/double_coalesce_x/16777216/4/manual_time                 -0.0090         -0.0086       1993410       1975491       2009394       1992187
Gather/double_coalesce_x/33554432/4/manual_time                 -0.0050         -0.0048       3954970       3935306       3971110       3951949
Gather/double_coalesce_x/67108864/4/manual_time                 -0.0023         -0.0023       7878515       7860230       7894641       7876672
Gather/double_coalesce_x/1024/8/manual_time                     -0.5563         -0.4245         52355         23229         68164         39227
Gather/double_coalesce_x/2048/8/manual_time                     -0.5797         -0.4492         55478         23320         71152         39193
Gather/double_coalesce_x/4096/8/manual_time                     -0.5741         -0.4479         56278         23969         71733         39604
Gather/double_coalesce_x/8192/8/manual_time                     -0.5631         -0.4396         56283         24593         71288         39950
Gather/double_coalesce_x/16384/8/manual_time                    -0.5606         -0.4478         57409         25225         71617         39544
Gather/double_coalesce_x/32768/8/manual_time                    -0.5522         -0.4434         59379         26589         72589         40404
Gather/double_coalesce_x/65536/8/manual_time                    -0.5155         -0.4245         64061         31039         76419         43976
Gather/double_coalesce_x/131072/8/manual_time                   -0.4309         -0.3484         74711         42515         90756         59133
Gather/double_coalesce_x/262144/8/manual_time                   -0.2965         -0.2563        113065         79542        128735         95737
Gather/double_coalesce_x/524288/8/manual_time                   -0.1876         -0.1703        180486        146625        196050        162662
Gather/double_coalesce_x/1048576/8/manual_time                  -0.1092         -0.1024        304861        271562        320625        287782
Gather/double_coalesce_x/2097152/8/manual_time                  -0.0612         -0.0587        554862        520898        570949        537418
Gather/double_coalesce_x/4194304/8/manual_time                  -0.0323         -0.0312       1043052       1009350       1059121       1026098
Gather/double_coalesce_x/8388608/8/manual_time                  -0.0172         -0.0168       2024565       1989707       2040646       2006293
Gather/double_coalesce_x/16777216/8/manual_time                 -0.0091         -0.0083       3986598       3950183       4002591       3969359
Gather/double_coalesce_x/33554432/8/manual_time                 -0.0044         -0.0043       7910094       7875344       7926103       7891889
Gather/double_coalesce_x/67108864/8/manual_time                 -0.0029         -0.0028      15756670      15711704      15772637      15727737
Gather/double_coalesce_o/1024/1/manual_time                     -0.4165         -0.1181          7540          4400         23582         20797
Gather/double_coalesce_o/2048/1/manual_time                     -0.4157         -0.1069          7564          4419         23582         21061
Gather/double_coalesce_o/4096/1/manual_time                     -0.3787         -0.1084          7625          4737         23566         21011
Gather/double_coalesce_o/8192/1/manual_time                     -0.4027         -0.1109          8143          4864         23999         21337
Gather/double_coalesce_o/16384/1/manual_time                    -0.4183         -0.1317          8711          5067         24510         21281
Gather/double_coalesce_o/32768/1/manual_time                    -0.3881         -0.1205          9015          5516         24384         21445
Gather/double_coalesce_o/65536/1/manual_time                    -0.3650         -0.1212          9990          6343         24875         21860
Gather/double_coalesce_o/131072/1/manual_time                   -0.2953         -0.1179         12117          8539         26154         23071
Gather/double_coalesce_o/262144/1/manual_time                   -0.2146         -0.1073         16378         12863         29651         26469
Gather/double_coalesce_o/524288/1/manual_time                   -0.1196         -0.0778         31047         27334         43873         40459
Gather/double_coalesce_o/1048576/1/manual_time                  -0.0515         -0.0377         71605         67916         85183         81968
Gather/double_coalesce_o/2097152/1/manual_time                  -0.0184         -0.0146        222413        218314        235646        232200
Gather/double_coalesce_o/4194304/1/manual_time                  -0.0064         -0.0050        554501        550934        567737        564900
Gather/double_coalesce_o/8388608/1/manual_time                  -0.0037         -0.0031       1233062       1228536       1246313       1242444
Gather/double_coalesce_o/16777216/1/manual_time                 -0.0011         -0.0008       2592989       2590183       2606156       2604187
Gather/double_coalesce_o/33554432/1/manual_time                 -0.0003         -0.0002       5317767       5316239       5331297       5330159
Gather/double_coalesce_o/67108864/1/manual_time                 -0.0008         -0.0007      10780717      10772467      10794100      10786236
Gather/double_coalesce_o/1024/2/manual_time                     -0.4749         -0.2129         14168          7440         30119         23706
Gather/double_coalesce_o/2048/2/manual_time                     -0.4584         -0.2056         14264          7725         30111         23921
Gather/double_coalesce_o/4096/2/manual_time                     -0.4588         -0.2107         14834          8027         30553         24116
Gather/double_coalesce_o/8192/2/manual_time                     -0.4802         -0.2311         15845          8236         31406         24147
Gather/double_coalesce_o/16384/2/manual_time                    -0.4684         -0.2320         16089          8552         31387         24105
Gather/double_coalesce_o/32768/2/manual_time                    -0.4537         -0.2310         16852          9206         31695         24372
Gather/double_coalesce_o/65536/2/manual_time                    -0.4170         -0.2272         18088         10545         32338         24990
Gather/double_coalesce_o/131072/2/manual_time                   -0.3573         -0.2078         21412         13762         34674         27471
Gather/double_coalesce_o/262144/2/manual_time                   -0.2510         -0.1602         30997         23218         43724         36718
Gather/double_coalesce_o/524288/2/manual_time                   -0.1337         -0.1021         58296         50503         72945         65499
Gather/double_coalesce_o/1048576/2/manual_time                  -0.0567         -0.0497        138199        130364        151953        144401
Gather/double_coalesce_o/2097152/2/manual_time                  -0.0187         -0.0172        440132        431901        453150        445365
Gather/double_coalesce_o/4194304/2/manual_time                  -0.0077         -0.0069       1105435       1096900       1118569       1110799
Gather/double_coalesce_o/8388608/2/manual_time                  -0.0038         -0.0035       2461556       2452208       2474559       2465897
Gather/double_coalesce_o/16777216/2/manual_time                 -0.0014         -0.0012       5181658       5174284       5194633       5188338
Gather/double_coalesce_o/33554432/2/manual_time                 -0.0001         -0.0000      10631856      10631187      10644908      10644775
Gather/double_coalesce_o/67108864/2/manual_time                 -0.0009         -0.0008      21559387      21540942      21571877      21554988
Gather/double_coalesce_o/1024/4/manual_time                     -0.5094         -0.3165         27187         13337         43114         29468
Gather/double_coalesce_o/2048/4/manual_time                     -0.5103         -0.3198         27694         13563         43438         29544
Gather/double_coalesce_o/4096/4/manual_time                     -0.5332         -0.3453         29825         13922         45394         29718
Gather/double_coalesce_o/8192/4/manual_time                     -0.5287         -0.3458         29984         14131         45284         29627
Gather/double_coalesce_o/16384/4/manual_time                    -0.5188         -0.3441         30227         14546         45181         29634
Gather/double_coalesce_o/32768/4/manual_time                    -0.5019         -0.3420         31817         15847         46025         30285
Gather/double_coalesce_o/65536/4/manual_time                    -0.4734         -0.3342         34132         17974         47690         31750
Gather/double_coalesce_o/131072/4/manual_time                   -0.4071         -0.2975         40209         23841         52946         37196
Gather/double_coalesce_o/262144/4/manual_time                   -0.2803         -0.2166         60175         43305         76024         59556
Gather/double_coalesce_o/524288/4/manual_time                   -0.1433         -0.1231        114081         97731        128731        112884
Gather/double_coalesce_o/1048576/4/manual_time                  -0.0622         -0.0578        271385        254505        285072        268596
Gather/double_coalesce_o/2097152/4/manual_time                  -0.0197         -0.0188        875284        858062        888525        871859
Gather/double_coalesce_o/4194304/4/manual_time                  -0.0082         -0.0079       2206771       2188750       2220135       2202596
Gather/double_coalesce_o/8388608/4/manual_time                  -0.0031         -0.0029       4916787       4901458       4929829       4915512
Gather/double_coalesce_o/16777216/4/manual_time                 -0.0015         -0.0014      10358980      10343619      10372080      10357498
Gather/double_coalesce_o/33554432/4/manual_time                 -0.0006         -0.0005      21260281      21247606      21273146      21261792
Gather/double_coalesce_o/67108864/4/manual_time                 -0.0006         -0.0006      43102736      43077674      43116256      43090011
Gather/double_coalesce_o/1024/8/manual_time                     -0.5440         -0.4166         53570         24426         69289         40421
Gather/double_coalesce_o/2048/8/manual_time                     -0.5672         -0.4421         56986         24664         72585         40498
Gather/double_coalesce_o/4096/8/manual_time                     -0.5631         -0.4421         57806         25256         73118         40796
Gather/double_coalesce_o/8192/8/manual_time                     -0.5541         -0.4371         57714         25737         72637         40884
Gather/double_coalesce_o/16384/8/manual_time                    -0.5549         -0.4453         59129         26321         73327         40675
Gather/double_coalesce_o/32768/8/manual_time                    -0.5363         -0.4328         61692         28607         74865         42466
Gather/double_coalesce_o/65536/8/manual_time                    -0.4975         -0.4108         66501         33416         79004         46546
Gather/double_coalesce_o/131072/8/manual_time                   -0.4267         -0.3492         77547         44460         93369         60766
Gather/double_coalesce_o/262144/8/manual_time                   -0.2904         -0.2529        119793         85000        135576        101284
Gather/double_coalesce_o/524288/8/manual_time                   -0.1536         -0.1421        225948        191244        240621        206418
Gather/double_coalesce_o/1048576/8/manual_time                  -0.0633         -0.0611        538118        504041        551829        518112
Gather/double_coalesce_o/2097152/8/manual_time                  -0.0205         -0.0200       1746283       1710438       1759218       1723951
Gather/double_coalesce_o/4194304/8/manual_time                  -0.0083         -0.0081       4408500       4372009       4421678       4385800
Gather/double_coalesce_o/8388608/8/manual_time                  -0.0039         -0.0038       9831244       9793360       9843922       9806761
Gather/double_coalesce_o/16777216/8/manual_time                 -0.0011         -0.0010      20711367      20688439      20723982      20702511
Gather/double_coalesce_o/33554432/8/manual_time                 -0.0007         -0.0006      42514063      42485748      42525224      42499184
Gather/double_coalesce_o/67108864/8/manual_time                 -0.0005         -0.0005      86172825      86129119      86183654      86143926
OVERALL_GEOMEAN                                                 -0.2887         -0.1687             0             0             0             0
Scatter/double_coalesce_x/1024/1/manual_time                    -0.3838         -0.1485         10042          6188         26177         22289
Scatter/double_coalesce_x/2048/1/manual_time                    -0.3811         -0.1360         10074          6235         26110         22560
Scatter/double_coalesce_x/4096/1/manual_time                    -0.3353         -0.1214         10202          6781         26198         23017
Scatter/double_coalesce_x/8192/1/manual_time                    -0.3302         -0.1237         10297          6898         26215         22971
Scatter/double_coalesce_x/16384/1/manual_time                   -0.3102         -0.1150         10295          7102         25968         22983
Scatter/double_coalesce_x/32768/1/manual_time                   -0.3113         -0.1283         11519          7933         26834         23392
Scatter/double_coalesce_x/65536/1/manual_time                   -0.2624         -0.1097         12768          9418         27576         24551
Scatter/double_coalesce_x/131072/1/manual_time                  -0.2079         -0.1021         15782         12501         29822         26778
Scatter/double_coalesce_x/262144/1/manual_time                  -0.1559         -0.0904         23036         19445         35794         32559
Scatter/double_coalesce_x/524288/1/manual_time                  -0.1028         -0.0656         34802         31224         47860         44719
Scatter/double_coalesce_x/1048576/1/manual_time                 -0.0551         +0.0444         62652         59199         75412         78763
Scatter/double_coalesce_x/2097152/1/manual_time                 -0.0296         -0.0245        117794        114311        130760        127563
Scatter/double_coalesce_x/4194304/1/manual_time                 -0.0177         -0.0171        228893        224846        242046        237915
Scatter/double_coalesce_x/8388608/1/manual_time                 -0.0099         -0.0090        450446        445975        463623        459460
Scatter/double_coalesce_x/16777216/1/manual_time                -0.0047         -0.0046        891826        887601        905062        900877
Scatter/double_coalesce_x/33554432/1/manual_time                -0.0030         -0.0030       1776250       1770862       1789841       1784457
Scatter/double_coalesce_x/1024/2/manual_time                    -0.4425         -0.2369         19021         10604         35057         26752
Scatter/double_coalesce_x/2048/2/manual_time                    -0.4415         -0.2367         19078         10655         35033         26742
Scatter/double_coalesce_x/4096/2/manual_time                    -0.4110         -0.2235         19333         11388         35205         27337
Scatter/double_coalesce_x/8192/2/manual_time                    -0.4175         -0.2272         19376         11286         35014         27059
Scatter/double_coalesce_x/16384/2/manual_time                   -0.3896         -0.2199         20032         12228         35413         27626
Scatter/double_coalesce_x/32768/2/manual_time                   -0.3752         -0.2141         20996         13119         35942         28245
Scatter/double_coalesce_x/65536/2/manual_time                   -0.3382         -0.2036         23363         15461         37602         29945
Scatter/double_coalesce_x/131072/2/manual_time                  -0.2725         -0.1817         29105         21173         42244         34569
Scatter/double_coalesce_x/262144/2/manual_time                  -0.1893         -0.1403         43481         35248         56508         48582
Scatter/double_coalesce_x/524288/2/manual_time                  -0.1194         -0.0947         68737         60527         83533         75621
Scatter/double_coalesce_x/1048576/2/manual_time                 -0.0667         -0.0564        124236        115955        140040        132140
Scatter/double_coalesce_x/2097152/2/manual_time                 -0.0345         -0.0317        234902        226807        250954        243003
Scatter/double_coalesce_x/4194304/2/manual_time                 -0.0183         -0.0170        456747        448393        472945        464926
Scatter/double_coalesce_x/8388608/2/manual_time                 -0.0100         -0.0098        899295        890336        915949        906993
Scatter/double_coalesce_x/16777216/2/manual_time                -0.0056         -0.0052       1783889       1773917       1800084       1790672
Scatter/double_coalesce_x/33554432/2/manual_time                -0.0034         -0.0032       3548999       3536910       3565194       3553715
Scatter/double_coalesce_x/1024/4/manual_time                    -0.4736         -0.3299         37032         19494         53049         35546
Scatter/double_coalesce_x/2048/4/manual_time                    -0.4604         -0.3209         37138         20039         53007         35997
Scatter/double_coalesce_x/4096/4/manual_time                    -0.4626         -0.3237         37154         19965         52816         35718
Scatter/double_coalesce_x/8192/4/manual_time                    -0.4445         -0.3157         37999         21108         53330         36495
Scatter/double_coalesce_x/16384/4/manual_time                   -0.4350         -0.3129         38220         21595         53109         36494
Scatter/double_coalesce_x/32768/4/manual_time                   -0.4171         -0.3062         40558         23643         54744         37980
Scatter/double_coalesce_x/65536/4/manual_time                   -0.3655         -0.2833         45002         28553         57830         41449
Scatter/double_coalesce_x/131072/4/manual_time                  -0.3040         -0.2448         55415         38570         68108         51437
Scatter/double_coalesce_x/262144/4/manual_time                  -0.2088         -0.1738         85274         67468        100752         83246
Scatter/double_coalesce_x/524288/4/manual_time                  -0.1212         -0.1055        137416        120766        153550        137348
Scatter/double_coalesce_x/1048576/4/manual_time                 -0.0666         -0.0575        247621        231128        263435        248296
Scatter/double_coalesce_x/2097152/4/manual_time                 -0.0351         -0.0331        469618        453150        485571        469491
Scatter/double_coalesce_x/4194304/4/manual_time                 -0.0184         -0.0174        912268        895522        928367        912188
Scatter/double_coalesce_x/8388608/4/manual_time                 -0.0094         -0.0090       1797611       1780800       1813978       1797634
Scatter/double_coalesce_x/16777216/4/manual_time                -0.0049         -0.0047       3562340       3544916       3578551       3561686
Scatter/double_coalesce_x/33554432/4/manual_time                -0.0026         -0.0024       7094395       7076175       7110481       7093091
Scatter/double_coalesce_x/1024/8/manual_time                    -0.4827         -0.3939         72794         37658         88680         53753
Scatter/double_coalesce_x/2048/8/manual_time                    -0.4856         -0.3979         72296         37192         87983         52970
Scatter/double_coalesce_x/4096/8/manual_time                    -0.4745         -0.3921         73686         38723         89008         54111
Scatter/double_coalesce_x/8192/8/manual_time                    -0.4716         -0.3926         74061         39137         88975         54046
Scatter/double_coalesce_x/16384/8/manual_time                   -0.4560         -0.3840         75374         41003         89420         55085
Scatter/double_coalesce_x/32768/8/manual_time                   -0.4334         -0.3725         78882         44694         91516         57428
Scatter/double_coalesce_x/65536/8/manual_time                   -0.3879         -0.3368         87838         53762        100749         66812
Scatter/double_coalesce_x/131072/8/manual_time                  -0.3227         -0.2793        107583         72868        123063         88687
Scatter/double_coalesce_x/262144/8/manual_time                  -0.2134         -0.1937        168410        132470        183788        148195
Scatter/double_coalesce_x/524288/8/manual_time                  -0.1234         -0.1151        275481        241490        291641        258085
Scatter/double_coalesce_x/1048576/8/manual_time                 -0.0695         -0.0667        496583        462059        512488        478302
Scatter/double_coalesce_x/2097152/8/manual_time                 -0.0368         -0.0356        939902        905358        955791        921734
Scatter/double_coalesce_x/4194304/8/manual_time                 -0.0188         -0.0182       1826112       1791842       1842042       1808493
Scatter/double_coalesce_x/8388608/8/manual_time                 -0.0096         -0.0094       3593040       3558513       3609308       3575408
Scatter/double_coalesce_x/16777216/8/manual_time                -0.0051         -0.0049       7127958       7091496       7143824       7108557
Scatter/double_coalesce_x/33554432/8/manual_time                -0.0023         -0.0023      14187034      14154471      14203256      14171233
Scatter/double_coalesce_o/1024/1/manual_time                    -0.3635         -0.1423         10277          6541         26380         22626
Scatter/double_coalesce_o/2048/1/manual_time                    -0.3563         -0.1339         10374          6677         26444         22904
Scatter/double_coalesce_o/4096/1/manual_time                    -0.3149         -0.1162         10549          7228         26504         23424
Scatter/double_coalesce_o/8192/1/manual_time                    -0.3104         -0.1163         10642          7339         26506         23423
Scatter/double_coalesce_o/16384/1/manual_time                   -0.2882         -0.1103         10688          7608         26212         23321
Scatter/double_coalesce_o/32768/1/manual_time                   -0.2954         -0.1266         12070          8504         27386         23918
Scatter/double_coalesce_o/65536/1/manual_time                   -0.2383         -0.1072         13593         10354         28434         25387
Scatter/double_coalesce_o/131072/1/manual_time                  -0.1763         -0.0910         17908         14751         31945         29039
Scatter/double_coalesce_o/262144/1/manual_time                  -0.1304         -0.0777         25980         22591         38787         35773
Scatter/double_coalesce_o/524288/1/manual_time                  -0.0799         -0.0545         44187         40657         57285         54161
Scatter/double_coalesce_o/1048576/1/manual_time                 -0.0266         -0.0213        125591        122247        139190        136220
Scatter/double_coalesce_o/2097152/1/manual_time                 -0.0139         -0.0123        304753        300531        319083        315157
Scatter/double_coalesce_o/4194304/1/manual_time                 -0.0104         -0.0096        713774        706384        728560        721548
Scatter/double_coalesce_o/8388608/1/manual_time                 +0.0070         +0.0071       1603736       1614893       1618966       1630511
Scatter/double_coalesce_o/16777216/1/manual_time                -0.0001         -0.0000       3436662       3436428       3452183       3452181
Scatter/double_coalesce_o/33554432/1/manual_time                -0.0001         -0.0000       7241444       7240878       7257009       7256990
Scatter/double_coalesce_o/1024/2/manual_time                    -0.4285         -0.2324         19646         11227         35689         27396
Scatter/double_coalesce_o/2048/2/manual_time                    -0.4252         -0.2317         19840         11404         35798         27502
Scatter/double_coalesce_o/4096/2/manual_time                    -0.4011         -0.2214         20074         12022         35939         27983
Scatter/double_coalesce_o/8192/2/manual_time                    -0.4002         -0.2239         20059         12031         35613         27640
Scatter/double_coalesce_o/16384/2/manual_time                   -0.3767         -0.2160         20668         12883         36022         28239
Scatter/double_coalesce_o/32768/2/manual_time                   -0.3591         -0.2076         21931         14055         36874         29218
Scatter/double_coalesce_o/65536/2/manual_time                   -0.3110         -0.1911         25217         17374         39503         31955
Scatter/double_coalesce_o/131072/2/manual_time                  -0.2322         -0.1605         33391         25639         46434         38982
Scatter/double_coalesce_o/262144/2/manual_time                  -0.1678         -0.1273         49005         40780         62038         54139
Scatter/double_coalesce_o/524288/2/manual_time                  -0.0941         -0.0766         86123         78018        100215         92542
Scatter/double_coalesce_o/1048576/2/manual_time                 -0.0315         -0.0285        249456        241602        263987        256450
Scatter/double_coalesce_o/2097152/2/manual_time                 -0.0136         -0.0127        607567        599285        622242        614314
Scatter/double_coalesce_o/4194304/2/manual_time                 -0.0110         -0.0106       1425745       1410075       1440721       1425509
Scatter/double_coalesce_o/8388608/2/manual_time                 +0.0036         +0.0037       3186849       3198233       3202164       3214010
Scatter/double_coalesce_o/16777216/2/manual_time                -0.0009         -0.0008       6874749       6868450       6890080       6884458
Scatter/double_coalesce_o/33554432/2/manual_time                +0.0004         +0.0004      14473319      14478570      14488919      14494355
Scatter/double_coalesce_o/1024/4/manual_time                    -0.4579         -0.3201         37920         20557         53893         36640
Scatter/double_coalesce_o/2048/4/manual_time                    -0.4436         -0.3130         38311         21315         54221         37252
Scatter/double_coalesce_o/4096/4/manual_time                    -0.4467         -0.3156         38506         21306         54058         36999
Scatter/double_coalesce_o/8192/4/manual_time                    -0.4328         -0.3105         39138         22199         54437         37535
Scatter/double_coalesce_o/16384/4/manual_time                   -0.4257         -0.3088         39370         22609         54304         37535
Scatter/double_coalesce_o/32768/4/manual_time                   -0.4049         -0.3005         42358         25207         56569         39571
Scatter/double_coalesce_o/65536/4/manual_time                   -0.3384         -0.2686         49394         32678         62283         45556
Scatter/double_coalesce_o/131072/4/manual_time                  -0.2683         -0.2208         64017         46838         76974         59981
Scatter/double_coalesce_o/262144/4/manual_time                  -0.1842         -0.1551         95546         77945        111007         93795
Scatter/double_coalesce_o/524288/4/manual_time                  -0.0981         -0.0884        170826        154072        185323        168941
Scatter/double_coalesce_o/1048576/4/manual_time                 -0.0339         -0.0321        497363        480508        511831        495414
Scatter/double_coalesce_o/2097152/4/manual_time                 -0.0154         -0.0149       1214567       1195859       1229218       1210892
Scatter/double_coalesce_o/4194304/4/manual_time                 -0.0066         -0.0064       2841719       2822870       2856734       2838398
Scatter/double_coalesce_o/8388608/4/manual_time                 -0.0019         -0.0019       6406513       6394104       6421834       6409774
Scatter/double_coalesce_o/16777216/4/manual_time                -0.0013         -0.0012      13733606      13715898      13748977      13732270
Scatter/double_coalesce_o/33554432/4/manual_time                -0.0000         -0.0000      28953036      28952328      28968055      28967750
Scatter/double_coalesce_o/1024/8/manual_time                    -0.4689         -0.3856         74431         39530         90313         55484
Scatter/double_coalesce_o/2048/8/manual_time                    -0.4695         -0.3879         74638         39596         90253         55244
Scatter/double_coalesce_o/4096/8/manual_time                    -0.4614         -0.3833         75842         40846         91151         56209
Scatter/double_coalesce_o/8192/8/manual_time                    -0.4590         -0.3838         75933         41078         90867         55994
Scatter/double_coalesce_o/16384/8/manual_time                   -0.4451         -0.3760         77529         43018         91600         57159
Scatter/double_coalesce_o/32768/8/manual_time                   -0.4189         -0.3627         82974         48220         95689         60979
Scatter/double_coalesce_o/65536/8/manual_time                   -0.3580         -0.3134         96314         61834        109288         75037
Scatter/double_coalesce_o/131072/8/manual_time                  -0.2852         -0.2513        123979         88622        139381        104354
Scatter/double_coalesce_o/262144/8/manual_time                  -0.1938         -0.1779        189537        152807        205038        168555
Scatter/double_coalesce_o/524288/8/manual_time                  -0.1001         -0.0950        340351        306268        354838        321131
Scatter/double_coalesce_o/1048576/8/manual_time                 -0.0343         -0.0334        993403        959309       1007858        974194
Scatter/double_coalesce_o/2097152/8/manual_time                 -0.0145         -0.0142       2425869       2390621       2440482       2405826
Scatter/double_coalesce_o/4194304/8/manual_time                 -0.0071         -0.0071       5683025       5642398       5698112       5657910
Scatter/double_coalesce_o/8388608/8/manual_time                 -0.0027         -0.0026      12796976      12762034      12811955      12778096
Scatter/double_coalesce_o/16777216/8/manual_time                -0.0015         -0.0015      27492759      27452776      27508479      27467821
Scatter/double_coalesce_o/33554432/8/manual_time                -0.0003         -0.0003      57897862      57878410      57910941      57893818
OVERALL_GEOMEAN                                                 -0.2354         -0.1542             0             0             0             0

vuule · 2022-11-07T23:32:27Z

Is there an issue for this?

bdice · 2022-11-07T23:33:51Z

Is there an issue for this?

I am preparing #12086 with more details and context (still writing/editing heavily, not in its final form yet). This will be one of several PRs that fall under that issue.

ttnghia

This is great 👍

bdice · 2022-11-07T23:59:13Z

@gpucibot merge

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Nov 1, 2022

Use nosync policy in gather and scatter implementations.

c91d0d0

bdice force-pushed the gather-scatter-nosync branch from c2f5d27 to c91d0d0 Compare November 1, 2022 17:25

davidwendt reviewed Nov 1, 2022

View reviewed changes

bdice self-assigned this Nov 1, 2022

Merge remote-tracking branch 'upstream/branch-22.12' into gather-scat…

a14ae56

…ter-nosync

bdice marked this pull request as ready for review November 4, 2022 21:49

bdice requested a review from a team as a code owner November 4, 2022 21:49

bdice requested review from cwharris and vuule November 4, 2022 21:49

bdice requested a review from davidwendt November 4, 2022 21:50

bdice added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Nov 4, 2022

davidwendt approved these changes Nov 7, 2022

View reviewed changes

bdice mentioned this pull request Nov 7, 2022

[FEA] Migrate to Thrust nosync stream policy for performance. #12086

Open

12 tasks

vuule approved these changes Nov 7, 2022

View reviewed changes

ttnghia approved these changes Nov 7, 2022

View reviewed changes

rapids-bot bot merged commit 2ced214 into rapidsai:branch-22.12 Nov 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use nosync policy in gather and scatter implementations. #12038

Use nosync policy in gather and scatter implementations. #12038

bdice commented Nov 1, 2022 •

edited

Loading

codecov bot commented Nov 1, 2022 •

edited

Loading

davidwendt Nov 1, 2022

bdice Nov 1, 2022

bdice Nov 1, 2022

davidwendt Nov 7, 2022

bdice Nov 7, 2022

davidwendt Nov 7, 2022

vuule Nov 7, 2022

bdice Nov 7, 2022

bdice Nov 7, 2022

bdice commented Nov 4, 2022

vuule commented Nov 7, 2022

bdice commented Nov 7, 2022 •

edited

Loading

ttnghia left a comment

bdice commented Nov 7, 2022

Use nosync policy in gather and scatter implementations. #12038

Use nosync policy in gather and scatter implementations. #12038

Conversation

bdice commented Nov 1, 2022 • edited Loading

Description

Checklist

codecov bot commented Nov 1, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice commented Nov 4, 2022

Benchmarks

vuule commented Nov 7, 2022

bdice commented Nov 7, 2022 • edited Loading

ttnghia left a comment

Choose a reason for hiding this comment

bdice commented Nov 7, 2022

bdice commented Nov 1, 2022 •

edited

Loading

codecov bot commented Nov 1, 2022 •

edited

Loading

bdice commented Nov 7, 2022 •

edited

Loading