Add microkernels for fixed-width and fixed-width dictionary in Parquet decode #15159

abellina · 2024-02-27T20:18:56Z

This PR adds parquet decode fixed width and fixed width dictionary kernels micro kernels based on #13622.

We change rle_stream to allow us to process dictionary pages, and so are able to use it in gpuDecodePageDataFixedDict allowing it to take a "count" to be decoded, which we provide as the number of valid (non-null) elements that were found in the definition stream. Most of the intrusive changes are in rle_stream so that it can work with this argument. One big change is that prior to this we used to "spill" runs that would not fit in the current iteration. We have changed it so that we don't spill anymore and we could have in the runs array a large run that won't be decoded until several calls to decode later. This opens the possibility for us to throw more decode threads at the accumulated run fairly easily, and that may be worked on in this PR or shortly after (load balancing a large run).

The code here is really mostly @nvdbaranec and makes use of @etseidl's great work on rle_stream. It is marked in draft because it's not entirely done (not all testing has been performed). That said, NDS, nvbench and PARQUET_TEST passes. In order to use it, please set USE_FIXED_OP=2 which means we enable both the gpuDecodePageDataFixed and gpuDecodePageDataFixedDict kernels.

Here are USE_FIXED_OP=2 nvbench results against 24.04 on my RTX6000:

# parquet_read_decode

## [0] Quadro RTX 6000

|  data_type  |    io_type    |  cardinality  |  run_length  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |         Diff |   %Diff |  Status  |
|-------------|---------------|---------------|--------------|------------|-------------|------------|-------------|--------------|---------|----------|
|  INTEGRAL   | DEVICE_BUFFER |       0       |      1       |  24.299 ms |       2.84% |  20.192 ms |       0.38% | -4106.917 us | -16.90% |   FAIL   |
|  INTEGRAL   | DEVICE_BUFFER |     1000      |      1       |  23.668 ms |       3.20% |  20.042 ms |       3.93% | -3625.853 us | -15.32% |   FAIL   |
|  INTEGRAL   | DEVICE_BUFFER |       0       |      32      |  21.226 ms |       0.21% |  17.781 ms |       0.15% | -3444.841 us | -16.23% |   FAIL   |
|  INTEGRAL   | DEVICE_BUFFER |     1000      |      32      |  20.532 ms |       0.30% |  17.221 ms |       0.46% | -3311.125 us | -16.13% |   FAIL   |
|    FLOAT    | DEVICE_BUFFER |       0       |      1       |  12.201 ms |       0.18% |   9.438 ms |       0.49% | -2763.086 us | -22.65% |   FAIL   |
|    FLOAT    | DEVICE_BUFFER |     1000      |      1       |  14.530 ms |       0.50% |  12.578 ms |       0.50% | -1952.573 us | -13.44% |   FAIL   |
|    FLOAT    | DEVICE_BUFFER |       0       |      32      |  12.195 ms |       0.45% |  10.204 ms |       0.45% | -1990.871 us | -16.33% |   FAIL   |
|    FLOAT    | DEVICE_BUFFER |     1000      |      32      |  12.765 ms |       0.31% |  10.703 ms |       0.19% | -2061.599 us | -16.15% |   FAIL   |
|   DECIMAL   | DEVICE_BUFFER |       0       |      1       |  27.572 ms |       2.47% |  23.911 ms |       0.32% | -3661.435 us | -13.28% |   FAIL   |
|   DECIMAL   | DEVICE_BUFFER |     1000      |      1       |  12.410 ms |       0.55% |  11.530 ms |       3.89% |  -880.084 us |  -7.09% |   FAIL   |
|   DECIMAL   | DEVICE_BUFFER |       0       |      32      |  13.479 ms |       0.26% |  12.119 ms |       0.37% | -1360.157 us | -10.09% |   FAIL   |
|   DECIMAL   | DEVICE_BUFFER |     1000      |      32      |   9.583 ms |       0.43% |   8.182 ms |       0.23% | -1400.545 us | -14.62% |   FAIL   |
|  TIMESTAMP  | DEVICE_BUFFER |       0       |      1       |  34.390 ms |       0.50% |  32.211 ms |       0.41% | -2178.951 us |  -6.34% |   FAIL   |
|  TIMESTAMP  | DEVICE_BUFFER |     1000      |      1       |  12.432 ms |       0.43% |  10.741 ms |       0.42% | -1691.559 us | -13.61% |   FAIL   |
|  TIMESTAMP  | DEVICE_BUFFER |       0       |      32      |  14.759 ms |       0.40% |  12.941 ms |       0.19% | -1817.825 us | -12.32% |   FAIL   |
|  TIMESTAMP  | DEVICE_BUFFER |     1000      |      32      |  10.613 ms |       0.32% |   8.791 ms |       0.19% | -1822.373 us | -17.17% |   FAIL   |
|  DURATION   | DEVICE_BUFFER |       0       |      1       |  14.849 ms |       0.29% |  12.812 ms |       0.21% | -2037.408 us | -13.72% |   FAIL   |
|  DURATION   | DEVICE_BUFFER |     1000      |      1       |  11.806 ms |       0.32% |  10.110 ms |       0.43% | -1695.815 us | -14.36% |   FAIL   |
|  DURATION   | DEVICE_BUFFER |       0       |      32      |  11.620 ms |       0.24% |   9.751 ms |       0.15% | -1869.041 us | -16.08% |   FAIL   |
|  DURATION   | DEVICE_BUFFER |     1000      |      32      |  10.307 ms |       0.27% |   8.398 ms |       0.19% | -1909.239 us | -18.52% |   FAIL   |
|   STRING    | DEVICE_BUFFER |       0       |      1       |  55.028 ms |       1.00% |  54.751 ms |       0.68% |  -277.519 us |  -0.50% |   PASS   |
|   STRING    | DEVICE_BUFFER |     1000      |      1       |  19.503 ms |       0.46% |  19.399 ms |       0.30% |  -104.924 us |  -0.54% |   FAIL   |
|   STRING    | DEVICE_BUFFER |       0       |      32      |  55.287 ms |       0.78% |  54.857 ms |       0.38% |  -430.236 us |  -0.78% |   FAIL   |
|   STRING    | DEVICE_BUFFER |     1000      |      32      |  15.392 ms |       0.62% |  15.527 ms |       1.62% |   135.949 us |   0.88% |   FAIL   |
|    LIST     | DEVICE_BUFFER |       0       |      1       |  85.392 ms |       0.64% |  85.956 ms |       0.36% |   564.047 us |   0.66% |   FAIL   |
|    LIST     | DEVICE_BUFFER |     1000      |      1       |  82.151 ms |       0.77% |  82.977 ms |       0.76% |   825.975 us |   1.01% |   FAIL   |
|    LIST     | DEVICE_BUFFER |       0       |      32      |  71.257 ms |       0.77% |  72.425 ms |       0.67% |     1.168 ms |   1.64% |   FAIL   |
|    LIST     | DEVICE_BUFFER |     1000      |      32      |  72.176 ms |       0.19% |  73.750 ms |       1.47% |     1.574 ms |   2.18% |   FAIL   |
|   STRUCT    | DEVICE_BUFFER |       0       |      1       |  66.675 ms |       1.41% |  66.663 ms |       1.26% |   -11.513 us |  -0.02% |   PASS   |
|   STRUCT    | DEVICE_BUFFER |     1000      |      1       |  39.667 ms |       0.41% |  39.758 ms |       0.50% |    91.341 us |   0.23% |   PASS   |
|   STRUCT    | DEVICE_BUFFER |       0       |      32      |  66.765 ms |       1.59% |  66.873 ms |       1.40% |   107.569 us |   0.16% |   PASS   |
|   STRUCT    | DEVICE_BUFFER |     1000      |      32      |  34.681 ms |       0.16% |  34.748 ms |       0.28% |    67.095 us |   0.19% |   FAIL   |

# parquet_read_io_compression

## [0] Quadro RTX 6000

|    io_type    |  compression_type  |  cardinality  |  run_length  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |         Diff |   %Diff |  Status  |
|---------------|--------------------|---------------|--------------|------------|-------------|------------|-------------|--------------|---------|----------|
|   FILEPATH    |       SNAPPY       |       0       |      1       |    1.438 s |       0.42% |    1.444 s |       0.32% |     5.797 ms |   0.40% |   FAIL   |
|  HOST_BUFFER  |       SNAPPY       |       0       |      1       |    1.376 s |       0.13% |    1.386 s |       0.14% |    10.316 ms |   0.75% |   FAIL   |
| DEVICE_BUFFER |       SNAPPY       |       0       |      1       |    1.341 s |       0.57% |    1.349 s |       0.11% |     8.160 ms |   0.61% |   FAIL   |
|   FILEPATH    |        NONE        |       0       |      1       |    1.220 s |       0.43% |    1.216 s |       0.48% | -4048.560 us |  -0.33% |   PASS   |
|  HOST_BUFFER  |        NONE        |       0       |      1       |    1.154 s |       0.13% |    1.156 s |       0.06% |     2.176 ms |   0.19% |   FAIL   |
| DEVICE_BUFFER |        NONE        |       0       |      1       |    1.113 s |       0.02% |    1.115 s |       0.15% |     2.028 ms |   0.18% |   FAIL   |
|   FILEPATH    |       SNAPPY       |     1000      |      1       |    1.317 s |       0.75% |    1.326 s |       0.16% |     9.613 ms |   0.73% |   FAIL   |
|  HOST_BUFFER  |       SNAPPY       |     1000      |      1       |    1.291 s |       0.63% |    1.298 s |       0.68% |     7.390 ms |   0.57% |   PASS   |
| DEVICE_BUFFER |       SNAPPY       |     1000      |      1       |    1.274 s |       0.67% |    1.288 s |       0.64% |    13.666 ms |   1.07% |   FAIL   |
|   FILEPATH    |        NONE        |     1000      |      1       |    1.200 s |       0.64% |    1.218 s |       0.42% |    18.036 ms |   1.50% |   FAIL   |
|  HOST_BUFFER  |        NONE        |     1000      |      1       |    1.181 s |       0.62% |    1.191 s |       0.69% |    10.329 ms |   0.87% |   FAIL   |
| DEVICE_BUFFER |        NONE        |     1000      |      1       |    1.166 s |       0.79% |    1.178 s |       0.67% |    11.599 ms |   0.99% |   FAIL   |
|   FILEPATH    |       SNAPPY       |       0       |      32      |    1.094 s |       0.64% |    1.106 s |       0.71% |    11.885 ms |   1.09% |   FAIL   |
|  HOST_BUFFER  |       SNAPPY       |       0       |      32      |    1.086 s |       0.72% |    1.101 s |       0.74% |    14.529 ms |   1.34% |   FAIL   |
| DEVICE_BUFFER |       SNAPPY       |       0       |      32      |    1.082 s |       0.63% |    1.098 s |       0.67% |    16.798 ms |   1.55% |   FAIL   |
|   FILEPATH    |        NONE        |       0       |      32      |    1.050 s |       0.27% |    1.059 s |       0.73% |     9.672 ms |   0.92% |   FAIL   |
|  HOST_BUFFER  |        NONE        |       0       |      32      |    1.035 s |       0.67% |    1.048 s |       0.73% |    13.583 ms |   1.31% |   FAIL   |
| DEVICE_BUFFER |        NONE        |       0       |      32      |    1.034 s |       0.80% |    1.045 s |       0.64% |    11.315 ms |   1.09% |   FAIL   |
|   FILEPATH    |       SNAPPY       |     1000      |      32      |    1.117 s |       0.53% |    1.130 s |       0.49% |    12.376 ms |   1.11% |   FAIL   |
|  HOST_BUFFER  |       SNAPPY       |     1000      |      32      |    1.109 s |       0.49% |    1.123 s |       0.50% |    14.328 ms |   1.29% |   FAIL   |
| DEVICE_BUFFER |       SNAPPY       |     1000      |      32      |    1.106 s |       0.59% |    1.126 s |       0.17% |    20.326 ms |   1.84% |   FAIL   |
|   FILEPATH    |        NONE        |     1000      |      32      |    1.029 s |       0.53% |    1.045 s |       0.61% |    15.633 ms |   1.52% |   FAIL   |
|  HOST_BUFFER  |        NONE        |     1000      |      32      |    1.025 s |       0.49% |    1.040 s |       0.06% |    14.977 ms |   1.46% |   FAIL   |
| DEVICE_BUFFER |        NONE        |     1000      |      32      |    1.031 s |       0.26% |    1.038 s |       0.17% |     7.165 ms |   0.70% |   FAIL   |

# parquet_read_chunks

## [0] Quadro RTX 6000

|     T     |    io_type    |  cardinality  |  run_length  |  byte_limit  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |          Diff |   %Diff |  Status  |
|-----------|---------------|---------------|--------------|--------------|------------|-------------|------------|-------------|---------------|---------|----------|
| INTEGRAL  | DEVICE_BUFFER |       0       |      1       |      0       |  24.646 ms |       0.07% |  20.620 ms |       0.25% |  -4025.493 us | -16.33% |   FAIL   |
| INTEGRAL  | DEVICE_BUFFER |     1000      |      1       |      0       |  24.005 ms |       0.35% |  20.284 ms |       0.24% |  -3721.359 us | -15.50% |   FAIL   |
| INTEGRAL  | DEVICE_BUFFER |       0       |      32      |      0       |  21.492 ms |       0.50% |  18.074 ms |       0.50% |  -3418.141 us | -15.90% |   FAIL   |
| INTEGRAL  | DEVICE_BUFFER |     1000      |      32      |      0       |  20.759 ms |       0.16% |  17.426 ms |       0.43% |  -3332.946 us | -16.06% |   FAIL   |
| INTEGRAL  | DEVICE_BUFFER |       0       |      1       |    500000    | 206.076 ms |       0.13% | 210.958 ms |       0.20% |      4.882 ms |   2.37% |   FAIL   |
| INTEGRAL  | DEVICE_BUFFER |     1000      |      1       |    500000    | 211.737 ms |       0.06% | 206.936 ms |       0.05% |  -4801.517 us |  -2.27% |   FAIL   |
| INTEGRAL  | DEVICE_BUFFER |       0       |      32      |    500000    | 191.610 ms |       0.10% | 190.775 ms |       0.19% |   -835.413 us |  -0.44% |   FAIL   |
| INTEGRAL  | DEVICE_BUFFER |     1000      |      32      |    500000    | 191.122 ms |       0.21% | 189.118 ms |       0.18% |  -2004.013 us |  -1.05% |   FAIL   |
|   FLOAT   | DEVICE_BUFFER |       0       |      1       |      0       |  12.332 ms |       0.50% |   9.576 ms |       0.16% |  -2755.392 us | -22.34% |   FAIL   |
|   FLOAT   | DEVICE_BUFFER |     1000      |      1       |      0       |  14.700 ms |       0.50% |  12.748 ms |       0.50% |  -1951.554 us | -13.28% |   FAIL   |
|   FLOAT   | DEVICE_BUFFER |       0       |      32      |      0       |  12.429 ms |       5.39% |  10.384 ms |       0.16% |  -2044.469 us | -16.45% |   FAIL   |
|   FLOAT   | DEVICE_BUFFER |     1000      |      32      |      0       |  13.010 ms |       4.35% |  10.871 ms |       0.19% |  -2138.692 us | -16.44% |   FAIL   |
|   FLOAT   | DEVICE_BUFFER |       0       |      1       |    500000    |  97.573 ms |       0.23% |  66.537 ms |       0.16% | -31035.966 us | -31.81% |   FAIL   |
|   FLOAT   | DEVICE_BUFFER |     1000      |      1       |    500000    | 107.469 ms |       0.27% |  84.752 ms |       0.28% | -22716.950 us | -21.14% |   FAIL   |
|   FLOAT   | DEVICE_BUFFER |       0       |      32      |    500000    |  95.086 ms |       0.26% |  74.493 ms |       0.15% | -20592.478 us | -21.66% |   FAIL   |
|   FLOAT   | DEVICE_BUFFER |     1000      |      32      |    500000    |  95.634 ms |       0.18% |  73.872 ms |       0.21% | -21761.426 us | -22.75% |   FAIL   |
|  DECIMAL  | DEVICE_BUFFER |       0       |      1       |      0       |  28.070 ms |       0.57% |  24.134 ms |       0.42% |  -3936.545 us | -14.02% |   FAIL   |
|  DECIMAL  | DEVICE_BUFFER |     1000      |      1       |      0       |  12.604 ms |       1.97% |  11.663 ms |       2.94% |   -940.845 us |  -7.46% |   FAIL   |
|  DECIMAL  | DEVICE_BUFFER |       0       |      32      |      0       |  13.551 ms |       0.28% |  12.220 ms |       0.18% |  -1330.788 us |  -9.82% |   FAIL   |
|  DECIMAL  | DEVICE_BUFFER |     1000      |      32      |      0       |   9.656 ms |       0.28% |   8.279 ms |       0.22% |  -1377.165 us | -14.26% |   FAIL   |
|  DECIMAL  | DEVICE_BUFFER |       0       |      1       |    500000    |  97.520 ms |       0.50% |  59.952 ms |       0.07% | -37568.352 us | -38.52% |   FAIL   |
|  DECIMAL  | DEVICE_BUFFER |     1000      |      1       |    500000    |  69.184 ms |       0.34% |  60.300 ms |       0.40% |  -8883.777 us | -12.84% |   FAIL   |
|  DECIMAL  | DEVICE_BUFFER |       0       |      32      |    500000    |  63.886 ms |       0.15% |  53.062 ms |       0.21% | -10824.109 us | -16.94% |   FAIL   |
|  DECIMAL  | DEVICE_BUFFER |     1000      |      32      |    500000    |  60.067 ms |       0.23% |  48.565 ms |       0.23% | -11501.788 us | -19.15% |   FAIL   |
| TIMESTAMP | DEVICE_BUFFER |       0       |      1       |      0       |  34.702 ms |       0.50% |  32.566 ms |       0.50% |  -2136.855 us |  -6.16% |   FAIL   |
| TIMESTAMP | DEVICE_BUFFER |     1000      |      1       |      0       |  12.558 ms |       0.41% |  10.874 ms |       0.41% |  -1684.535 us | -13.41% |   FAIL   |
| TIMESTAMP | DEVICE_BUFFER |       0       |      32      |      0       |  14.803 ms |       0.18% |  13.131 ms |       0.24% |  -1671.818 us | -11.29% |   FAIL   |
| TIMESTAMP | DEVICE_BUFFER |     1000      |      32      |      0       |  10.720 ms |       0.31% |   8.928 ms |       0.16% |  -1791.785 us | -16.71% |   FAIL   |
| TIMESTAMP | DEVICE_BUFFER |       0       |      1       |    500000    | 105.994 ms |       0.31% |  93.799 ms |       0.13% | -12194.481 us | -11.50% |   FAIL   |
| TIMESTAMP | DEVICE_BUFFER |     1000      |      1       |    500000    |  87.865 ms |       0.20% |  69.684 ms |       0.23% | -18181.288 us | -20.69% |   FAIL   |
| TIMESTAMP | DEVICE_BUFFER |       0       |      32      |    500000    |  81.901 ms |       0.09% |  65.145 ms |       0.18% | -16755.397 us | -20.46% |   FAIL   |
| TIMESTAMP | DEVICE_BUFFER |     1000      |      32      |    500000    |  77.813 ms |       0.15% |  60.106 ms |       0.27% | -17707.050 us | -22.76% |   FAIL   |
| DURATION  | DEVICE_BUFFER |       0       |      1       |      0       |  14.926 ms |       0.25% |  12.981 ms |       0.47% |  -1945.295 us | -13.03% |   FAIL   |
| DURATION  | DEVICE_BUFFER |     1000      |      1       |      0       |  11.977 ms |       0.41% |  10.241 ms |       0.45% |  -1736.000 us | -14.49% |   FAIL   |
| DURATION  | DEVICE_BUFFER |       0       |      32      |      0       |  11.707 ms |       0.25% |   9.898 ms |       0.14% |  -1809.748 us | -15.46% |   FAIL   |
| DURATION  | DEVICE_BUFFER |     1000      |      32      |      0       |  10.402 ms |       0.49% |   8.535 ms |       0.11% |  -1867.021 us | -17.95% |   FAIL   |
| DURATION  | DEVICE_BUFFER |       0       |      1       |    500000    |  84.585 ms |       0.14% |  68.369 ms |       0.24% | -16216.952 us | -19.17% |   FAIL   |
| DURATION  | DEVICE_BUFFER |     1000      |      1       |    500000    |  85.377 ms |       0.20% |  66.735 ms |       0.21% | -18642.646 us | -21.84% |   FAIL   |
| DURATION  | DEVICE_BUFFER |       0       |      32      |    500000    |  76.875 ms |       0.15% |  59.947 ms |       0.19% | -16928.464 us | -22.02% |   FAIL   |
| DURATION  | DEVICE_BUFFER |     1000      |      32      |    500000    |  75.601 ms |       0.18% |  57.722 ms |       0.23% | -17879.257 us | -23.65% |   FAIL   |
|  STRING   | DEVICE_BUFFER |       0       |      1       |      0       |  55.084 ms |       0.50% |  55.018 ms |       0.78% |    -66.611 us |  -0.12% |   PASS   |
|  STRING   | DEVICE_BUFFER |     1000      |      1       |      0       |  19.502 ms |       0.27% |  19.617 ms |       0.49% |    115.500 us |   0.59% |   FAIL   |
|  STRING   | DEVICE_BUFFER |       0       |      32      |      0       |  55.233 ms |       0.72% |  55.394 ms |       0.77% |    161.028 us |   0.29% |   PASS   |
|  STRING   | DEVICE_BUFFER |     1000      |      32      |      0       |  15.445 ms |       0.53% |  15.674 ms |       0.50% |    229.270 us |   1.48% |   FAIL   |
|  STRING   | DEVICE_BUFFER |       0       |      1       |    500000    | 197.625 ms |       0.33% | 198.769 ms |       0.48% |      1.144 ms |   0.58% |   FAIL   |
|  STRING   | DEVICE_BUFFER |     1000      |      1       |    500000    |  90.339 ms |       0.48% |  90.797 ms |       0.58% |    457.714 us |   0.51% |   FAIL   |
|  STRING   | DEVICE_BUFFER |       0       |      32      |    500000    | 198.090 ms |       0.35% | 198.238 ms |       0.31% |    147.910 us |   0.07% |   PASS   |
|  STRING   | DEVICE_BUFFER |     1000      |      32      |    500000    |  67.181 ms |       0.17% |  67.796 ms |       0.34% |    615.115 us |   0.92% |   FAIL   |
|   LIST    | DEVICE_BUFFER |       0       |      1       |      0       |  90.825 ms |       0.75% |  91.843 ms |       0.58% |      1.019 ms |   1.12% |   FAIL   |
|   LIST    | DEVICE_BUFFER |     1000      |      1       |      0       |  87.731 ms |       0.63% |  88.633 ms |       0.69% |    901.582 us |   1.03% |   FAIL   |
|   LIST    | DEVICE_BUFFER |       0       |      32      |      0       |  76.089 ms |       0.47% |  77.107 ms |       0.39% |      1.018 ms |   1.34% |   FAIL   |
|   LIST    | DEVICE_BUFFER |     1000      |      32      |      0       |  77.148 ms |       0.69% |  78.719 ms |       0.63% |      1.571 ms |   2.04% |   FAIL   |
|   LIST    | DEVICE_BUFFER |       0       |      1       |    500000    |    1.470 s |       0.15% |    1.469 s |       0.09% |   -336.597 us |  -0.02% |   PASS   |
|   LIST    | DEVICE_BUFFER |     1000      |      1       |    500000    |    1.092 s |       0.15% |    1.092 s |       0.37% |    269.702 us |   0.02% |   PASS   |
|   LIST    | DEVICE_BUFFER |       0       |      32      |    500000    | 956.759 ms |       0.31% | 956.879 ms |       0.24% |    119.287 us |   0.01% |   PASS   |
|   LIST    | DEVICE_BUFFER |     1000      |      32      |    500000    | 959.021 ms |       0.43% | 957.862 ms |       0.24% |  -1159.729 us |  -0.12% |   PASS   |
|  STRUCT   | DEVICE_BUFFER |       0       |      1       |      0       |  66.450 ms |       2.05% |  66.381 ms |       1.19% |    -69.128 us |  -0.10% |   PASS   |
|  STRUCT   | DEVICE_BUFFER |     1000      |      1       |      0       |  39.866 ms |       0.50% |  39.669 ms |       0.36% |   -197.133 us |  -0.49% |   FAIL   |
|  STRUCT   | DEVICE_BUFFER |       0       |      32      |      0       |  66.996 ms |       1.50% |  66.634 ms |       1.54% |   -362.507 us |  -0.54% |   PASS   |
|  STRUCT   | DEVICE_BUFFER |     1000      |      32      |      0       |  34.995 ms |       0.18% |  34.746 ms |       0.24% |   -249.649 us |  -0.71% |   FAIL   |
|  STRUCT   | DEVICE_BUFFER |       0       |      1       |    500000    | 387.274 ms |       0.24% | 381.353 ms |       0.28% |  -5920.825 us |  -1.53% |   FAIL   |
|  STRUCT   | DEVICE_BUFFER |     1000      |      1       |    500000    | 313.150 ms |       0.23% | 308.024 ms |       0.07% |  -5125.220 us |  -1.64% |   FAIL   |
|  STRUCT   | DEVICE_BUFFER |       0       |      32      |    500000    | 388.062 ms |       0.49% | 381.960 ms |       0.12% |  -6102.039 us |  -1.57% |   FAIL   |
|  STRUCT   | DEVICE_BUFFER |     1000      |      32      |    500000    | 306.761 ms |       0.07% | 302.670 ms |       0.15% |  -4090.417 us |  -1.33% |   FAIL   |

# parquet_read_io_small_mixed

## [0] Quadro RTX 6000

|  io_type  |  cardinality  |  run_length  |  num_string_cols  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |        Diff |   %Diff |  Status  |
|-----------|---------------|--------------|-------------------|------------|-------------|------------|-------------|-------------|---------|----------|
| FILEPATH  |       0       |      1       |         1         |   6.298 ms |       3.83% |   6.324 ms |       2.82% |   26.068 us |   0.41% |   PASS   |
| FILEPATH  |     1000      |      1       |         1         |   4.380 ms |       9.20% |   4.389 ms |       7.10% |    8.962 us |   0.20% |   PASS   |
| FILEPATH  |       0       |      32      |         1         |   5.849 ms |       3.40% |   5.846 ms |       3.32% |   -2.973 us |  -0.05% |   PASS   |
| FILEPATH  |     1000      |      32      |         1         |   3.005 ms |       0.50% |   3.104 ms |       2.46% |   98.349 us |   3.27% |   FAIL   |
| FILEPATH  |       0       |      1       |         2         |   6.827 ms |       1.81% |   6.828 ms |       1.85% |    0.700 us |   0.01% |   PASS   |
| FILEPATH  |     1000      |      1       |         2         |   4.362 ms |       3.89% |   4.363 ms |       4.08% |    1.252 us |   0.03% |   PASS   |
| FILEPATH  |       0       |      32      |         2         |   6.498 ms |       1.72% |   6.462 ms |       1.94% |  -36.489 us |  -0.56% |   PASS   |
| FILEPATH  |     1000      |      32      |         2         |   3.020 ms |       0.61% |   3.098 ms |       2.52% |   78.572 us |   2.60% |   FAIL   |
| FILEPATH  |       0       |      1       |         3         |   7.271 ms |       2.64% |   7.412 ms |       2.19% |  141.040 us |   1.94% |   PASS   |
| FILEPATH  |     1000      |      1       |         3         |   4.368 ms |       6.54% |   4.360 ms |       2.48% |   -7.720 us |  -0.18% |   PASS   |
| FILEPATH  |       0       |      32      |         3         |   7.236 ms |       3.17% |   7.096 ms |       2.32% | -139.268 us |  -1.92% |   PASS   |
| FILEPATH  |     1000      |      32      |         3         |   3.101 ms |       2.41% |   3.118 ms |       2.45% |   17.422 us |   0.56% |   PASS   |

# parquet_read_row_selection

## [0] Quadro RTX 6000

|  column_selection  |  row_selection  |  str_to_categories  |  uses_pandas_metadata  |  timestamp_type  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |      Diff |   %Diff |  Status  |
|--------------------|-----------------|---------------------|------------------------|------------------|------------|-------------|------------|-------------|-----------|---------|----------|
|        ALL         |       ALL       |         YES         |          YES           |      EMPTY       |    1.463 s |       0.23% |    1.472 s |       0.21% |  8.307 ms |   0.57% |   FAIL   |
|        ALL         |      NROWS      |         YES         |          YES           |      EMPTY       |    1.669 s |       0.04% |    1.680 s |       0.02% | 11.129 ms |   0.67% |   FAIL   |
|        ALL         |   ROW_GROUPS    |         YES         |          YES           |      EMPTY       |    1.548 s |       0.01% |    1.555 s |       0.00% |  6.236 ms |   0.40% |   FAIL   |

# parquet_read_column_selection

## [0] Quadro RTX 6000

|  column_selection  |  row_selection  |  str_to_categories  |  uses_pandas_metadata  |  timestamp_type  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |      Diff |   %Diff |  Status  |
|--------------------|-----------------|---------------------|------------------------|------------------|------------|-------------|------------|-------------|-----------|---------|----------|
|        ALL         |       ALL       |         YES         |          YES           |      EMPTY       |    1.458 s |       0.49% |    1.472 s |       0.10% | 13.627 ms |   0.93% |   FAIL   |
|     ALTERNATE      |       ALL       |         YES         |          YES           |      EMPTY       |    1.451 s |       0.01% |    1.463 s |       0.52% | 12.089 ms |   0.83% |   FAIL   |
|     FIRST_HALF     |       ALL       |         YES         |          YES           |      EMPTY       |    1.438 s |       0.02% |    1.446 s |       0.60% |  7.382 ms |   0.51% |   FAIL   |
|    SECOND_HALF     |       ALL       |         YES         |          YES           |      EMPTY       |    1.447 s |       0.02% |    1.456 s |       0.53% |  8.770 ms |   0.61% |   FAIL   |

# parquet_read_misc_options

## [0] Quadro RTX 6000

|  column_selection  |  row_selection  |  str_to_categories  |  uses_pandas_metadata  |  timestamp_type  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |      Diff |   %Diff |  Status  |
|--------------------|-----------------|---------------------|------------------------|------------------|------------|-------------|------------|-------------|-----------|---------|----------|
|        ALL         |       ALL       |         YES         |          YES           |      EMPTY       |    1.456 s |       0.50% |    1.471 s |       0.49% | 15.157 ms |   1.04% |   FAIL   |
|        ALL         |       ALL       |         YES         |           NO           |      EMPTY       |    1.464 s |       0.14% |    1.473 s |       0.49% |  9.022 ms |   0.62% |   FAIL   |
|        ALL         |       ALL       |         NO          |          YES           |      EMPTY       |    1.457 s |       0.41% |    1.476 s |       0.23% | 19.434 ms |   1.33% |   FAIL   |
|        ALL         |       ALL       |         NO          |           NO           |      EMPTY       |    1.460 s |       0.47% |    1.476 s |       0.09% | 15.917 ms |   1.09% |   FAIL   |

# Summary

- Total Matches: 143
  - Pass    (diff <= min_noise): 25
  - Unknown (infinite noise):    0
  - Failure (diff > min_noise):  118

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2024-02-27T20:18:59Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

dictionary Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

abellina · 2024-02-27T20:28:30Z

I force pushed a signed version of my commit, fyi.

vuule

Went over everything, except the rle_stream changes; that one will take a bit more time.
The fixed width kernels look great!

Posted some questions and nitpicks.

cpp/src/io/parquet/decode_fixed.hpp

cpp/src/io/parquet/page_hdr.cu

vuule · 2024-02-27T23:09:24Z

cpp/src/io/parquet/decode_fixed.cu

+__global__ void __launch_bounds__(decode_block_size) gpuDecodePageDataFixed(
+  PageInfo* pages, device_span<ColumnChunkDesc const> chunks, size_t min_row, size_t num_rows)
+{
+  __shared__ __align__(16) page_state_s state_g;


Could we gain anything by splitting page_state_s so that only the relevant members are included in the specialized kernels?
I assume its size is small compared to rle_run objects so it might not move the needle WRT shared memory use, but asking just in case.

Yes, I think so. We should add this to another follow on for us to tackle in the future. Will file. @nvdbaranec fyi.

Filed this: #15267

cpp/src/io/parquet/page_hdr.cu

cpp/src/io/parquet/page_data.cuh

etseidl

Looks great so far. Just some random comments.

cpp/src/io/parquet/decode_fixed.cu

etseidl · 2024-02-28T00:18:09Z

cpp/src/io/parquet/decode_fixed.cu

+  __shared__ rle_run<level_t> def_runs[rle_run_buffer_size];
+  rle_stream<level_t, decode_block_size, rolling_buf_size> def_decoder{def_runs};
+
+  bool const nullable = s->col.max_level[level_type::DEFINITION] > 0;


Do we know null counts by the time we get here? Maybe via metadata (v2 headers, page indexes)? It would be nice to skip a RLE decode of a stream of all 1's (esp. given how spark likes to make everything nullable).

Followup: I did a quick check with some data that I know has no nulls, but the schema says OPTIONAL, so the max def level is 1 (but always 1). Setting nullable to false reduces the dict decode time to 6.7ms from 9.8ms for this data set. Total times to decode integer data were 12.6ms for monolithic, 10.7ms for this PR (some flat, some dict), and 7.7ms with nullable = false.

For non-nullable we also don't need the nz_idx array, which saves 1000 bytes of shared mem, but the expense of populating it is pretty negligible. Just food for thought if we want to go with a kernel exclusively for flat, non-nullable (or no nulls present) data.

I was also thinking we could reason about the presence of nulls by examining the head of the def levels...if the first run has a size equal to num_values, then we know it's either all nulls or non nulls, depending on the encoded run value.

etseidl · 2024-02-28T00:27:32Z

cpp/src/io/parquet/decode_fixed.cu

+
+  // should the size be 1/2 (128?)
+  __shared__ rle_run<uint32_t> dict_runs[rle_run_buffer_size];  // should be array of 6
+  rle_stream<uint32_t, decode_block_size, rolling_buf_size> dict_stream{dict_runs};


could really go crazy and parameterize this on the number of bits used for the dictionary keys. I'd bet that in the large majority of cases the keys would fit in a uint16_t.

yes? @nvdbaranec has plans for a "grand templatization" so I think this fits right in. Lets do as follow on.

cpp/src/io/parquet/page_hdr.cu

cpp/src/io/parquet/decode_fixed.cu

etseidl · 2024-02-29T00:58:42Z

Just to beat the no nulls case to death, I modified the definition of nullable.

  bool const no_nulls =
    s->page.num_input_values == (s->initial_rle_run[level_type::DEFINITION] >> 1) &&
    s->initial_rle_value[level_type::DEFINITION] == s->col.max_level[level_type::DEFINITION];
  bool const nullable = s->col.max_level[level_type::DEFINITION] > 0 and not no_nulls;

Then I modified the null probability in the parquet_read_decode benchmark to be 0 and compared monolithic, this PR, and my modification.

baseline
| data_type |    io_type    | cardinality | run_length | Samples | CPU Time  | Noise | GPU Time  | Noise | bytes_per_second | 
|-----------|---------------|-------------|------------|---------|-----------|-------|-----------|-------|------------------|
|  INTEGRAL | DEVICE_BUFFER |           0 |          1 |   1400x | 10.668 ms | 0.67% | 10.664 ms | 0.67% |      50346204219 |         
|  INTEGRAL | DEVICE_BUFFER |        1000 |          1 |     60x | 10.509 ms | 0.50% | 10.504 ms | 0.50% |      51109003358 |       
|  INTEGRAL | DEVICE_BUFFER |           0 |         32 |     62x | 11.500 ms | 0.50% | 11.495 ms | 0.50% |      46703489028 |       
|  INTEGRAL | DEVICE_BUFFER |        1000 |         32 |   1352x | 11.048 ms | 1.08% | 11.043 ms | 1.08% |      48614861959 |       

this PR
| data_type |    io_type    | cardinality | run_length | Samples | CPU Time  | Noise | GPU Time  | Noise | bytes_per_second | 
|-----------|---------------|-------------|------------|---------|-----------|-------|-----------|-------|------------------|
|  INTEGRAL | DEVICE_BUFFER |           0 |          1 |   1312x | 10.270 ms | 1.04% | 10.266 ms | 1.04% |      52297918168 |         
|  INTEGRAL | DEVICE_BUFFER |        1000 |          1 |    180x | 10.778 ms | 0.50% | 10.774 ms | 0.50% |      49832097922 |       
|  INTEGRAL | DEVICE_BUFFER |           0 |         32 |   1334x | 11.196 ms | 1.31% | 11.191 ms | 1.31% |      47973598600 |       
|  INTEGRAL | DEVICE_BUFFER |        1000 |         32 |   1418x | 10.533 ms | 5.63% | 10.528 ms | 5.60% |      50995279694 |       

my mods
| data_type |    io_type    | cardinality | run_length | Samples | CPU Time | Noise | GPU Time | Noise | bytes_per_second | 
|-----------|---------------|-------------|------------|---------|----------|-------|----------|-------|------------------|
|  INTEGRAL | DEVICE_BUFFER |           0 |          1 |    912x | 8.789 ms | 1.75% | 8.784 ms | 1.75% |      61117276983 |         
|  INTEGRAL | DEVICE_BUFFER |        1000 |          1 |   1675x | 8.908 ms | 7.62% | 8.902 ms | 7.52% |      60305784528 |       
|  INTEGRAL | DEVICE_BUFFER |           0 |         32 |    160x | 9.090 ms | 1.64% | 9.085 ms | 1.63% |      59093445255 |       
|  INTEGRAL | DEVICE_BUFFER |        1000 |         32 |   1751x | 8.518 ms | 3.61% | 8.513 ms | 3.61% |      63061423469 |

cpp/src/io/parquet/decode_fixed.cu

abellina · 2024-03-04T15:18:32Z

Moved out of draft. I've tested this with the tests we have on hand (spark integration tests, NDS, etc). And I am not seeing differences. I'll work on incorporating feedback and retesting.

…move extra nz_idx

robertmaynard

Approving CMake changes

nvdbaranec · 2024-03-04T17:05:51Z

Just to beat the no nulls case to death, I modified the definition of nullable.

  bool const no_nulls =
    s->page.num_input_values == (s->initial_rle_run[level_type::DEFINITION] >> 1) &&
    s->initial_rle_value[level_type::DEFINITION] == s->col.max_level[level_type::DEFINITION];
  bool const nullable = s->col.max_level[level_type::DEFINITION] > 0 and not no_nulls;

Interesting idea. So this would catch one common case where we have one giant repeated run of "not null". It's a little wonky, but I can't think of a failure case. If some writer somehow managed to encode an all-valid set of data using more than one run, this would still work - it would just traverse the data the normal way.

etseidl · 2024-03-04T17:17:20Z

Interesting idea. So this would catch one common case where we have one giant repeated run of "not null". It's a little wonky, but I can't think of a failure case. If some writer somehow managed to encode an all-valid set of data using more than one run, this would still work - it would just traverse the data the normal way.

Exactly...can't guard against sub optimal writers :) But we could also augment this test with either V2 header stats or page indexes (when they're available for use...shameless #14973 plug 🤣).

cpp/src/io/parquet/decode_preprocess.cu

cpp/src/io/parquet/page_hdr.cu

….com/abellina/cudf into ab_rle_stream

nvdbaranec · 2024-03-11T15:29:00Z

/ok to test

etseidl

Looking good. A few more nits that can be ignored. Thanks!

Not for this PR, but it would be nice to go back and change all of the various RLE decoders to use the new is_literal_run() and is_repeated_run().

cpp/src/io/parquet/decode_fixed.cu

etseidl · 2024-03-11T16:41:27Z

cpp/src/io/parquet/decode_fixed.cu

+  int const valid_map_offset      = ni.valid_map_offset;
+  int const row_index_lower_bound = s->row_index_lower_bound;
+
+  __syncthreads();


I agree that this sync is needed. Might want to add a comment to remind future us why it's here.

nvdbaranec · 2024-03-11T18:44:34Z

/ok to test

etseidl · 2024-03-11T21:08:17Z

cpp/src/io/parquet/page_hdr.cu

+  if (!is_string_col(chunk) && !is_nested(chunk) && !is_byte_array(chunk) && !is_boolean(chunk)) {
+    if (page.encoding == Encoding::PLAIN) {
+      return decode_kernel_mask::FIXED_WIDTH_NO_DICT;
+    } else if (page.encoding == Encoding::PLAIN_DICTIONARY) {
+      return decode_kernel_mask::FIXED_WIDTH_DICT;
+    }
+  }


If this moves to the bottom, the !is_string_col() check can go away. Should also check for Encoding::RLE_DICTIONARY since PLAIN_DICTIONARY is deprecated for V2 headers.

vuule

I just have a few questions (and a headache).

cpp/src/io/parquet/rle_stream.cuh

vuule · 2024-03-11T23:05:49Z

cpp/src/io/parquet/rle_stream.cuh

@@ -154,154 +176,94 @@ struct rle_stream {
  static constexpr int run_buffer_size = rle_stream_required_run_buffer_size<decode_threads>();

  int level_bits;


thoughts to having prefixes/suffixes to help tell data members apart from local variables?
not too relevant of this PR

In the past I've use m_ for member variables. Not sure I really loved it though.

cpp/src/io/parquet/rle_stream.cuh

…r check.

nvdbaranec · 2024-03-12T17:03:22Z

/ok to test

abellina · 2024-03-12T21:51:34Z

/ok to test

nvdbaranec · 2024-03-13T14:52:35Z

Added do-not-merge so I can finish looking at the odd crash in the python tests.

nvdbaranec · 2024-03-13T21:09:52Z

/ok to test

nvdbaranec · 2024-03-14T16:41:24Z

/merge

See #15297. The Parquet string decoder can become a bottleneck in the presence of strings of widely varying sizes. This PR is an attempt to address this, at least as a stop gap solution. A more complete solution may be to rework the string decoder to work in a block-wide fashion, such as the new micro-kernels added in #15159. Authors: - Ed Seidl (https://github.com/etseidl) - Nghia Truong (https://github.com/ttnghia) Approvers: - Nghia Truong (https://github.com/ttnghia) - Vukasin Milovanovic (https://github.com/vuule) URL: #15304

Closes #15226. Part of #13501. Adds support for reading and writing `BYTE_STREAM_SPLIT` encoded Parquet data. Includes a "microkernel" version like those introduced by #15159. Authors: - Ed Seidl (https://github.com/etseidl) - Vukasin Milovanovic (https://github.com/vuule) Approvers: - Muhammad Haseeb (https://github.com/mhaseeb123) - Vukasin Milovanovic (https://github.com/vuule) URL: #15311

github-actions bot added libcudf Affects libcudf (C++/CUDA) code. CMake CMake build issue labels Feb 27, 2024

abellina added Performance Performance related issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Spark Functionality that helps Spark RAPIDS labels Feb 27, 2024

rle_stream with dictionary support + micro kernels for fixed and fixed

d66835d

dictionary Signed-off-by: Alessandro Bellina <abellina@nvidia.com>

abellina force-pushed the fixed_ukernel_rlestream_24.04_rebase branch from d6e2b02 to d66835d Compare February 27, 2024 20:27

vuule self-requested a review February 27, 2024 22:32

vuule reviewed Feb 27, 2024

View reviewed changes

etseidl reviewed Feb 28, 2024

View reviewed changes

cpp/src/io/parquet/decode_fixed.cu Outdated Show resolved Hide resolved

etseidl reviewed Feb 29, 2024

View reviewed changes

cpp/src/io/parquet/decode_fixed.cu Outdated Show resolved Hide resolved

abellina marked this pull request as ready for review March 4, 2024 15:17

abellina requested review from a team as code owners March 4, 2024 15:17

abellina requested review from vuule and davidwendt March 4, 2024 15:17

Merge branch 'branch-24.04' into fixed_ukernel_rlestream_24.04_rebase

05600dd

nvdbaranec mentioned this pull request Mar 4, 2024

Parquet multi kernel optimization example. #13622

Closed

namespace fixes, use cudf::detail::warp_size, comment removal, and re…

1c4e0bf

…move extra nz_idx

robertmaynard approved these changes Mar 4, 2024

View reviewed changes

Remove unused headers

d2344f4

nvdbaranec requested changes Mar 4, 2024

View reviewed changes

cpp/src/io/parquet/decode_preprocess.cu Outdated Show resolved Hide resolved

cpp/src/io/parquet/page_hdr.cu Outdated Show resolved Hide resolved

nvdbaranec added 3 commits March 11, 2024 07:21

Merge branch 'fixed_ukernel_rlestream_24.04_rebase' of https://github…

a578143

….com/abellina/cudf into ab_rle_stream

Merge branch 'branch-24.04' into ab_rle_stream

6005530

Small comment update about loop pipelining in decode_next.

d930865

nvdbaranec self-requested a review March 11, 2024 15:28

nvdbaranec approved these changes Mar 11, 2024

View reviewed changes

nvdbaranec requested review from vuule and etseidl March 11, 2024 15:29

etseidl approved these changes Mar 11, 2024

View reviewed changes

Add another use of is_literal_run.

09c83a1

etseidl reviewed Mar 11, 2024

View reviewed changes

vuule reviewed Mar 11, 2024

View reviewed changes

Add is_last_decode_warp() for clarity. Removed extraneous if constexp…

57b14d7

…r check.

vuule approved these changes Mar 12, 2024

View reviewed changes

Merge branch 'branch-24.04' into ab_rle_stream

3049d98

nvdbaranec added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Mar 13, 2024

Add a __syncwarp() to handle a potential memory race in rle_stream.

404c043

nvdbaranec removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Mar 14, 2024

rapids-bot bot merged commit 7d4aaaa into rapidsai:branch-24.04 Mar 14, 2024
76 checks passed

This was referenced Mar 14, 2024

[BUG] Address performance hotspot in Parquet decode of "map" strings data #15297

Closed

Address poor performance of Parquet string decoding #15304

Merged

Add BYTE_STREAM_SPLIT support to Parquet #15311

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add microkernels for fixed-width and fixed-width dictionary in Parquet decode #15159

Add microkernels for fixed-width and fixed-width dictionary in Parquet decode #15159

abellina commented Feb 27, 2024 •

edited

Loading

copy-pr-bot bot commented Feb 27, 2024

abellina commented Feb 27, 2024

vuule left a comment

vuule Feb 27, 2024

abellina Mar 8, 2024

abellina Mar 11, 2024

etseidl left a comment

etseidl Feb 28, 2024

etseidl Feb 28, 2024 •

edited

Loading

etseidl Feb 28, 2024 •

edited

Loading

abellina Mar 8, 2024

etseidl commented Feb 29, 2024

abellina commented Mar 4, 2024

robertmaynard left a comment

nvdbaranec commented Mar 4, 2024

etseidl commented Mar 4, 2024

nvdbaranec commented Mar 11, 2024

etseidl left a comment

etseidl Mar 11, 2024

nvdbaranec commented Mar 11, 2024

etseidl Mar 11, 2024

vuule left a comment

vuule Mar 11, 2024

nvdbaranec Mar 12, 2024 •

edited

Loading

nvdbaranec commented Mar 12, 2024

abellina commented Mar 12, 2024

nvdbaranec commented Mar 13, 2024

nvdbaranec commented Mar 13, 2024

nvdbaranec commented Mar 14, 2024

		@@ -154,154 +176,94 @@ struct rle_stream {
		static constexpr int run_buffer_size = rle_stream_required_run_buffer_size<decode_threads>();

		int level_bits;

Add microkernels for fixed-width and fixed-width dictionary in Parquet decode #15159

Add microkernels for fixed-width and fixed-width dictionary in Parquet decode #15159

Conversation

abellina commented Feb 27, 2024 • edited Loading

Checklist

copy-pr-bot bot commented Feb 27, 2024

abellina commented Feb 27, 2024

vuule left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etseidl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etseidl Feb 28, 2024 • edited Loading

Choose a reason for hiding this comment

etseidl Feb 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

etseidl commented Feb 29, 2024

abellina commented Mar 4, 2024

robertmaynard left a comment

Choose a reason for hiding this comment

nvdbaranec commented Mar 4, 2024

etseidl commented Mar 4, 2024

nvdbaranec commented Mar 11, 2024

etseidl left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nvdbaranec commented Mar 11, 2024

Choose a reason for hiding this comment

vuule left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nvdbaranec Mar 12, 2024 • edited Loading

Choose a reason for hiding this comment

nvdbaranec commented Mar 12, 2024

abellina commented Mar 12, 2024

nvdbaranec commented Mar 13, 2024

nvdbaranec commented Mar 13, 2024

nvdbaranec commented Mar 14, 2024

abellina commented Feb 27, 2024 •

edited

Loading

etseidl Feb 28, 2024 •

edited

Loading

etseidl Feb 28, 2024 •

edited

Loading

nvdbaranec Mar 12, 2024 •

edited

Loading