Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Can't name subset of columns with read_csv #8973

Closed
efajardo-nv opened this issue Aug 5, 2021 · 6 comments · Fixed by #12018
Closed

[BUG] Can't name subset of columns with read_csv #8973

efajardo-nv opened this issue Aug 5, 2021 · 6 comments · Fixed by #12018
Assignees
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.

Comments

@efajardo-nv
Copy link
Contributor

Describe the bug
Our framework uses a wrapper around cuDF read_csv to read CSV file input into our data pipelines. Ideally, we should be able read in any CSV just using a single call to read_csv with the appropriate arguments. We have run into an issue where read_csv returns an error when trying to name columns selected via the usecols argument. The pandas equivalent works.

Steps/Code to reproduce bug

import cudf
import pandas as pd

filename = 'foo.csv'
lines = [
  "num1,datetime,text",
  "123,2018-11-13T12:00:00,abc",
  "456,2018-11-14T12:35:01,def",
  "789,2018-11-15T18:02:59,ghi"
]

with open(filename, 'w') as fp:
    fp.write('\n'.join(lines)+'\n')

cuDF:

>>> cudf.read_csv(filename, skiprows=1, header=None, usecols=[2], names=['renamed_text_col'])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/conda/envs/rapids/lib/python3.8/contextlib.py", line 75, in inner
    return func(*args, **kwds)
  File "/opt/conda/envs/rapids/lib/python3.8/site-packages/cudf/io/csv.py", line 70, in read_csv
    return libcudf.csv.read_csv(
  File "cudf/_lib/csv.pyx", line 393, in cudf._lib.csv.read_csv
RuntimeError: basic_string::_M_construct null not valid

pandas:

>>> pd.read_csv(filename, skiprows=1, header=None, usecols=[2], names=['renamed_text_col'])
  renamed_text_col
0              abc
1              def
2              ghi

Expected behavior
Same result as pandas

Environment overview (please complete the following information)

  • Environment location: Docker
  • Method of cuDF install: Docker
    • docker pull rapidsai/rapidsai-nightly:21.08-cuda11.0-runtime-ubuntu18.04-py3.8

Environment details

Click here to see environment details
 **git***
 commit 29b5f9ac6d24c64163349f1a5b2b5b5ef049769e (HEAD -> branch-21.10, origin/branch-21.10, origin/HEAD)
 Author: David Wendt <45795991+davidwendt@users.noreply.github.com>
 Date:   Wed Aug 4 19:11:03 2021 -0400
 
 Move template parameter to function parameter in cudf::detail::left_semi_anti_join (#8914)
 
 The `semi_join.cu` takes about 6 minutes to compile on my Linux 18.04 desktop when doing a full build of libcudf. The `join_kind` template parameter used internally in `cudf::detail::left_semi_anti_join` for `left_semi_join` and `left_anti_join` APIs is not used in a `constexpr` or to pass to any other templated function. This PR moves the template parameter to a runtime parameter on the detail functions reducing the compile time for `semi_join.cu` by ~2x.
 
 Another improvement includes un-inlining the `is_trivial_join` utility function to reduce the compile time for files that include `join_common_utils.hpp`.
 
 Finally, the device vector used as a gather map in `detail::left_semi_anti_join` was wrapped with a `column_view`  in order to call `detail::gather` without iterators. This allowed not including the heavy `gather.cuh`. This improved the compile time about 10% and reduced the object file `semi_join.cu.o` size by 2x.
 
 Authors:
 - David Wendt (https://github.com/davidwendt)
 
 Approvers:
 - Mike Wilson (https://github.com/hyperbolic2346)
 - Mark Harris (https://github.com/harrism)
 
 URL: https://github.com/rapidsai/cudf/pull/8914
 **git submodules***
 
 ***OS Information***
 DISTRIB_ID=Ubuntu
 DISTRIB_RELEASE=18.04
 DISTRIB_CODENAME=bionic
 DISTRIB_DESCRIPTION="Ubuntu 18.04.5 LTS"
 NAME="Ubuntu"
 VERSION="18.04.5 LTS (Bionic Beaver)"
 ID=ubuntu
 ID_LIKE=debian
 PRETTY_NAME="Ubuntu 18.04.5 LTS"
 VERSION_ID="18.04"
 HOME_URL="https://www.ubuntu.com/"
 SUPPORT_URL="https://help.ubuntu.com/"
 BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
 PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
 VERSION_CODENAME=bionic
 UBUNTU_CODENAME=bionic
 Linux EFAJARDO-DT 5.4.0-77-generic #86~18.04.1-Ubuntu SMP Fri Jun 18 01:23:22 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
 
 ***GPU Information***
 Thu Aug  5 17:48:05 2021
 +-----------------------------------------------------------------------------+
 | NVIDIA-SMI 465.19.01    Driver Version: 465.19.01    CUDA Version: 11.3     |
 |-------------------------------+----------------------+----------------------+
 | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
 | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
 |                               |                      |               MIG M. |
 |===============================+======================+======================|
 |   0  NVIDIA Quadro R...  On   | 00000000:15:00.0 Off |                  Off |
 | 33%   38C    P8    32W / 260W |   8114MiB / 48601MiB |      0%      Default |
 |                               |                      |                  N/A |
 +-------------------------------+----------------------+----------------------+
 
 +-----------------------------------------------------------------------------+
 | Processes:                                                                  |
 |  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
 |        ID   ID                                                   Usage      |
 |=============================================================================|
 +-----------------------------------------------------------------------------+
 
 ***CPU***
 Architecture:        x86_64
 CPU op-mode(s):      32-bit, 64-bit
 Byte Order:          Little Endian
 CPU(s):              12
 On-line CPU(s) list: 0-11
 Thread(s) per core:  2
 Core(s) per socket:  6
 Socket(s):           1
 NUMA node(s):        1
 Vendor ID:           GenuineIntel
 CPU family:          6
 Model:               85
 Model name:          Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz
 Stepping:            4
 CPU MHz:             1786.574
 CPU max MHz:         3700.0000
 CPU min MHz:         1200.0000
 BogoMIPS:            6800.00
 Virtualization:      VT-x
 L1d cache:           32K
 L1i cache:           32K
 L2 cache:            1024K
 L3 cache:            19712K
 NUMA node0 CPU(s):   0-11
 Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d
 
 ***CMake***
 
 ***g++***
 
 ***nvcc***
 
 ***Python***
 /opt/conda/envs/rapids/bin/python
 Python 3.8.10
 
 ***Environment Variables***
 PATH                            : /opt/conda/envs/rapids/bin:/opt/conda/condabin:/opt/conda/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
 LD_LIBRARY_PATH                 : /usr/local/nvidia/lib:/usr/local/nvidia/lib64
 NUMBAPRO_NVVM                   :
 NUMBAPRO_LIBDEVICE              :
 CONDA_PREFIX                    : /opt/conda/envs/rapids
 PYTHON_PATH                     :
 
 ***conda packages***
 /opt/conda/condabin/conda
 # packages in environment at /opt/conda/envs/rapids:
 #
 # Name                    Version                   Build  Channel
 _libgcc_mutex             0.1                 conda_forge    conda-forge
 _openmp_mutex             4.5                      1_llvm    conda-forge
 abseil-cpp                20210324.2           h9c3ff4c_0    conda-forge
 aiobotocore               1.3.3              pyhd8ed1ab_0    conda-forge
 aiohttp                   3.7.4.post0      py38h497a2fe_0    conda-forge
 aioitertools              0.7.1              pyhd8ed1ab_0    conda-forge
 anyio                     3.3.0            py38h578d9bd_0    conda-forge
 appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
 argon2-cffi               20.1.0           py38h497a2fe_2    conda-forge
 arrow-cpp                 4.0.1           py38h9c16596_6_cuda    conda-forge
 arrow-cpp-proc            3.0.0                      cuda    conda-forge
 async-timeout             3.0.1                   py_1000    conda-forge
 async_generator           1.10                       py_0    conda-forge
 attrs                     21.2.0             pyhd8ed1ab_0    conda-forge
 aws-c-cal                 0.5.11               h95a6274_0    conda-forge
 aws-c-common              0.6.2                h7f98852_0    conda-forge
 aws-c-event-stream        0.2.7               h3541f99_13    conda-forge
 aws-c-io                  0.10.5               hfb6a706_0    conda-forge
 aws-checksums             0.1.11               ha31a3da_7    conda-forge
 aws-sdk-cpp               1.8.186              hb4091e7_3    conda-forge
 babel                     2.9.1              pyh44b312d_0    conda-forge
 backcall                  0.2.0              pyh9f0ad1d_0    conda-forge
 backports                 1.0                        py_2    conda-forge
 backports.functools_lru_cache 1.6.4              pyhd8ed1ab_0    conda-forge
 blas                      2.110                  openblas    conda-forge
 blas-devel                3.9.0               10_openblas    conda-forge
 blazingsql                21.8.0a0                 pypi_0    pypi
 bleach                    3.3.1              pyhd8ed1ab_0    conda-forge
 blosc                     1.21.0               h9c3ff4c_0    conda-forge
 bokeh                     2.3.3            py38h578d9bd_0    conda-forge
 boost                     1.72.0           py38h1e42940_1    conda-forge
 boost-cpp                 1.72.0               h312852a_5    conda-forge
 botocore                  1.20.106           pyhd8ed1ab_0    conda-forge
 brotli                    1.0.9                h7f98852_5    conda-forge
 brotli-bin                1.0.9                h7f98852_5    conda-forge
 brotlipy                  0.7.0           py38h497a2fe_1001    conda-forge
 brunsli                   0.1                  h9c3ff4c_0    conda-forge
 bzip2                     1.0.8                h7f98852_4    conda-forge
 c-ares                    1.17.1               h7f98852_1    conda-forge
 ca-certificates           2021.5.30            ha878542_0    conda-forge
 cachetools                4.2.2              pyhd8ed1ab_0    conda-forge
 cairo                     1.16.0            h6cf1ce9_1008    conda-forge
 certifi                   2021.5.30        py38h578d9bd_0    conda-forge
 cffi                      1.14.6           py38ha65f79e_0    conda-forge
 cfitsio                   3.470                hb418390_7    conda-forge
 chardet                   4.0.0            py38h578d9bd_1    conda-forge
 charls                    2.2.0                h9c3ff4c_0    conda-forge
 charset-normalizer        2.0.0              pyhd8ed1ab_0    conda-forge
 click                     7.1.2              pyh9f0ad1d_0    conda-forge
 click-plugins             1.1.1                      py_0    conda-forge
 cligj                     0.7.2              pyhd8ed1ab_0    conda-forge
 cloudpickle               1.6.0                      py_0    conda-forge
 colorama                  0.4.4              pyh9f0ad1d_0    conda-forge
 colorcet                  2.0.6              pyhd8ed1ab_0    conda-forge
 cryptography              3.4.7            py38ha5dfef3_0    conda-forge
 cudatoolkit               11.0.221             h6bb024c_0    nvidia
 cudf                      21.08.00a210727 cuda_11.0_py38_g483fed3502_327    rapidsai-nightly
 cudf_kafka                21.08.00a210727 py38_g483fed3502_327    rapidsai-nightly
 cugraph                   21.08.00a210728 cuda11.0_py38_ge5b35997_89    rapidsai-nightly
 cuml                      21.08.00a210803 cuda11.0_py38_g366e71fe2_139    rapidsai-nightly
 cupy                      9.0.0            py38hc350bd8_0    conda-forge
 curl                      7.78.0               hea6ffbf_0    conda-forge
 cusignal                  21.08.00a210804 py38_gb197d6f_24    rapidsai-nightly
 cuspatial                 21.08.00a210803 py38_g2344dcd_24    rapidsai-nightly
 custreamz                 21.08.00a210727 py38_ga69a8a43b5_324    rapidsai-nightly
 cuxfilter                 21.08.00a210803 py38_gc51a660_21    rapidsai-nightly
 cycler                    0.10.0                     py_2    conda-forge
 cyrus-sasl                2.1.27               h230043b_2    conda-forge
 cython                    0.29.24          py38h709712a_0    conda-forge
 cytoolz                   0.11.0           py38h497a2fe_3    conda-forge
 dask                      2021.7.2           pyhd8ed1ab_0    conda-forge
 dask-core                 2021.7.2           pyhd8ed1ab_0    conda-forge
 dask-cuda                 21.08.00a210727         py38_40    rapidsai-nightly
 dask-cudf                 21.08.00a210727 py38_ga69a8a43b5_324    rapidsai-nightly
 dask-glm                  0.2.0                      py_1    conda-forge
 dask-labextension         5.1.0              pyhd8ed1ab_0    conda-forge
 dask-ml                   1.9.0              pyhd8ed1ab_0    conda-forge
 datashader                0.11.1             pyh9f0ad1d_0    conda-forge
 datashape                 0.5.4                      py_1    conda-forge
 decorator                 4.4.2                      py_0    conda-forge
 defusedxml                0.7.1              pyhd8ed1ab_0    conda-forge
 distributed               2021.7.2         py38h578d9bd_0    conda-forge
 dlpack                    0.5                  h9c3ff4c_0    conda-forge
 entrypoints               0.3             pyhd8ed1ab_1003    conda-forge
 expat                     2.4.1                h9c3ff4c_0    conda-forge
 fa2                       0.3.5            py38h1e0a361_0    conda-forge
 faiss-proc                1.0.0                      cuda    rapidsai
 fastavro                  1.4.4            py38h497a2fe_0    conda-forge
 fastrlock                 0.6              py38h709712a_1    conda-forge
 filterpy                  1.4.5                      py_1    conda-forge
 fiona                     1.8.20           py38hdb5a769_0    conda-forge
 fontconfig                2.13.1            hba837de_1005    conda-forge
 freetype                  2.10.4               h0708190_1    conda-forge
 freexl                    1.0.6                h7f98852_0    conda-forge
 fsspec                    2021.7.0           pyhd8ed1ab_0    conda-forge
 future                    0.18.2           py38h578d9bd_3    conda-forge
 gdal                      3.2.2            py38h507a4fd_7    conda-forge
 gdk-pixbuf                2.42.6               h04a7f16_0    conda-forge
 geopandas                 0.9.0              pyhd8ed1ab_1    conda-forge
 geopandas-base            0.9.0              pyhd8ed1ab_1    conda-forge
 geos                      3.9.1                h9c3ff4c_2    conda-forge
 geotiff                   1.6.0                h4f31c25_6    conda-forge
 gettext                   0.19.8.1          h0b5b191_1005    conda-forge
 gflags                    2.2.2             he1b5a44_1004    conda-forge
 giflib                    5.2.1                h36c2ea0_2    conda-forge
 git                       2.32.0          pl5321hc30692c_0    conda-forge
 glib                      2.68.3               h9c3ff4c_0    conda-forge
 glib-tools                2.68.3               h9c3ff4c_0    conda-forge
 glog                      0.5.0                h48cff8f_0    conda-forge
 google-cloud-cpp          1.29.0               ha08a4db_0    conda-forge
 gpuci-tools               0.3.1                         8    gpuci
 greenlet                  1.1.0            py38h709712a_0    conda-forge
 grpc-cpp                  1.38.1               h36ce80c_0    conda-forge
 hdf4                      4.2.15               h10796ff_3    conda-forge
 hdf5                      1.10.6          nompi_h6a2412b_1114    conda-forge
 heapdict                  1.0.1                      py_0    conda-forge
 holoviews                 1.14.5             pyhd8ed1ab_0    conda-forge
 icu                       68.1                 h58526e2_0    conda-forge
 idna                      3.1                pyhd3deb0d_0    conda-forge
 imagecodecs               2021.7.30        py38hb5ce8f7_0    conda-forge
 imageio                   2.9.0                      py_0    conda-forge
 importlib-metadata        4.6.3            py38h578d9bd_0    conda-forge
 iniconfig                 1.1.1              pyh9f0ad1d_0    conda-forge
 ipykernel                 5.5.5            py38hd0cf306_0    conda-forge
 ipython                   7.15.0           py38h32f6830_0    conda-forge
 ipython_genutils          0.2.0                      py_1    conda-forge
 ipywidgets                7.6.3              pyhd3deb0d_0    conda-forge
 jedi                      0.17.2           py38h578d9bd_1    conda-forge
 jinja2                    3.0.1              pyhd8ed1ab_0    conda-forge
 jmespath                  0.10.0             pyh9f0ad1d_0    conda-forge
 joblib                    1.0.1              pyhd8ed1ab_0    conda-forge
 jpeg                      9d                   h36c2ea0_0    conda-forge
 jpype1                    1.3.0            py38h1fd1430_0    conda-forge
 json-c                    0.15                 h98cffda_0    conda-forge
 json5                     0.9.5              pyh9f0ad1d_0    conda-forge
 jsonschema                3.2.0              pyhd8ed1ab_3    conda-forge
 jupyter-packaging         0.7.12             pyhd8ed1ab_0    conda-forge
 jupyter-server-proxy      3.1.0              pyhd8ed1ab_0    conda-forge
 jupyter_client            6.1.12             pyhd8ed1ab_0    conda-forge
 jupyter_core              4.7.1            py38h578d9bd_0    conda-forge
 jupyter_server            1.10.2             pyhd8ed1ab_0    conda-forge
 jupyterlab                3.1.1              pyhd8ed1ab_0    conda-forge
 jupyterlab-nvdashboard    0.7.0a210727               py_4    rapidsai-nightly
 jupyterlab_pygments       0.1.2              pyh9f0ad1d_0    conda-forge
 jupyterlab_server         2.6.1              pyhd8ed1ab_0    conda-forge
 jupyterlab_widgets        1.0.0              pyhd8ed1ab_1    conda-forge
 jxrlib                    1.1                  h7f98852_2    conda-forge
 kealib                    1.4.14               hcc255d8_2    conda-forge
 kiwisolver                1.3.1            py38h1fd1430_1    conda-forge
 krb5                      1.19.2               hcc1bbae_0    conda-forge
 lcms2                     2.12                 hddcbb42_0    conda-forge
 ld_impl_linux-64          2.36.1               hea4e1c9_2    conda-forge
 lerc                      2.2.1                h9c3ff4c_0    conda-forge
 libaec                    1.0.5                h9c3ff4c_0    conda-forge
 libblas                   3.9.0               10_openblas    conda-forge
 libbrotlicommon           1.0.9                h7f98852_5    conda-forge
 libbrotlidec              1.0.9                h7f98852_5    conda-forge
 libbrotlienc              1.0.9                h7f98852_5    conda-forge
 libcblas                  3.9.0               10_openblas    conda-forge
 libcrc32c                 1.1.1                h9c3ff4c_2    conda-forge
 libcudf                   21.08.00a210803 cuda11.0_ga70cc8c4ab_334    rapidsai-nightly
 libcudf_kafka             21.08.00a210727 g483fed3502_327    rapidsai-nightly
 libcugraph                21.08.00a210728 cuda11.0_g7c603dd1_92    rapidsai-nightly
 libcuml                   21.08.00a210803 cuda11.0_g366e71fe2_139    rapidsai-nightly
 libcumlprims              21.08.00a210715 cuda11.0_g4db0971_5    rapidsai-nightly
 libcurl                   7.78.0               h2574ce0_0    conda-forge
 libcuspatial              21.08.00a210803 cuda11.0_g2344dcd_24    rapidsai-nightly
 libdap4                   3.20.6               hd7c4107_2    conda-forge
 libdeflate                1.8                  h7f98852_0    conda-forge
 libedit                   3.1.20191231         he28a2e2_2    conda-forge
 libev                     4.33                 h516909a_1    conda-forge
 libevent                  2.1.10               hcdb4288_3    conda-forge
 libfaiss                  1.7.0           cuda110h8045045_8_cuda    conda-forge
 libffi                    3.3                  h58526e2_2    conda-forge
 libgcc-ng                 9.3.0               h2828fa1_19    conda-forge
 libgcrypt                 1.9.3                h7f98852_1    conda-forge
 libgdal                   3.2.2                h8f005ca_7    conda-forge
 libgfortran-ng            11.1.0               h69a702a_6    conda-forge
 libgfortran5              11.1.0               h6c583b3_6    conda-forge
 libglib                   2.68.3               h3e27bee_0    conda-forge
 libgpg-error              1.42                 h9c3ff4c_0    conda-forge
 libgsasl                  1.8.0                         2    conda-forge
 libhwloc                  2.3.0                h5e5b7d1_1    conda-forge
 libiconv                  1.16                 h516909a_0    conda-forge
 libkml                    1.3.0             hd79254b_1012    conda-forge
 liblapack                 3.9.0               10_openblas    conda-forge
 liblapacke                3.9.0               10_openblas    conda-forge
 libllvm10                 10.0.1               he513fc3_3    conda-forge
 libnetcdf                 4.8.0           nompi_hcd642e3_103    conda-forge
 libnghttp2                1.43.0               h812cca2_0    conda-forge
 libntlm                   1.4               h7f98852_1002    conda-forge
 libopenblas               0.3.17          pthreads_h8fe5266_1    conda-forge
 libpng                    1.6.37               h21135ba_2    conda-forge
 libpq                     13.3                 hd57d9b9_0    conda-forge
 libprotobuf               3.16.0               h780b84a_0    conda-forge
 librdkafka                1.6.1                hc49e61c_1    conda-forge
 librmm                    21.08.00a210804 cuda11.0_gb79fdae_41    rapidsai-nightly
 librttopo                 1.1.0                h1185371_6    conda-forge
 libsodium                 1.0.18               h36c2ea0_1    conda-forge
 libspatialindex           1.9.3                h9c3ff4c_4    conda-forge
 libspatialite             5.0.1                h8694cbe_5    conda-forge
 libssh2                   1.9.0                ha56f1ee_6    conda-forge
 libstdcxx-ng              9.3.0               h6de172a_19    conda-forge
 libthrift                 0.14.2               he6d91bd_1    conda-forge
 libtiff                   4.3.0                hf544144_0    conda-forge
 libutf8proc               2.6.1                h7f98852_0    conda-forge
 libuuid                   2.32.1            h7f98852_1000    conda-forge
 libuv                     1.42.0               h7f98852_0    conda-forge
 libwebp                   1.2.0                h3452ae3_0    conda-forge
 libwebp-base              1.2.0                h7f98852_2    conda-forge
 libxcb                    1.13              h7f98852_1003    conda-forge
 libxgboost                1.4.2dev.rapidsai21.08      cuda11.0_0    rapidsai-nightly
 libxml2                   2.9.12               h72842e0_0    conda-forge
 libzip                    1.8.0                h4de3113_0    conda-forge
 libzopfli                 1.0.3                h9c3ff4c_0    conda-forge
 llvm-openmp               12.0.1               h4bd325d_1    conda-forge
 llvmlite                  0.36.0           py38h4630a5e_0    conda-forge
 locket                    0.2.0                      py_2    conda-forge
 lz4-c                     1.9.3                h9c3ff4c_1    conda-forge
 mapclassify               2.4.3              pyhd8ed1ab_0    conda-forge
 markdown                  3.3.4              pyhd8ed1ab_0    conda-forge
 markupsafe                2.0.1            py38h497a2fe_0    conda-forge
 matplotlib-base           3.4.2            py38hcc49a3a_0    conda-forge
 mistune                   0.8.4           py38h497a2fe_1004    conda-forge
 more-itertools            8.8.0              pyhd8ed1ab_0    conda-forge
 msgpack-python            1.0.2            py38h1fd1430_1    conda-forge
 multidict                 5.1.0            py38h497a2fe_1    conda-forge
 multipledispatch          0.6.0                      py_0    conda-forge
 munch                     2.5.0                      py_0    conda-forge
 nbclassic                 0.3.1              pyhd8ed1ab_1    conda-forge
 nbclient                  0.5.3              pyhd8ed1ab_0    conda-forge
 nbconvert                 6.1.0            py38h578d9bd_0    conda-forge
 nbformat                  5.1.3              pyhd8ed1ab_0    conda-forge
 nccl                      2.10.3.1             h96e36e3_0    conda-forge
 ncurses                   6.2                  h58526e2_4    conda-forge
 nest-asyncio              1.5.1              pyhd8ed1ab_0    conda-forge
 netifaces                 0.10.9          py38h497a2fe_1003    conda-forge
 networkx                  2.6.2              pyhd8ed1ab_0    conda-forge
 nlohmann_json             3.9.1                h9c3ff4c_1    conda-forge
 nodejs                    14.17.4              h92b4a50_0    conda-forge
 notebook                  6.4.0              pyha770c72_0    conda-forge
 numba                     0.53.1           py38h8b71fd7_1    conda-forge
 numpy                     1.21.1           py38h9894fe3_0    conda-forge
 nvtx                      0.2.3            py38h497a2fe_0    conda-forge
 olefile                   0.46               pyh9f0ad1d_1    conda-forge
 openblas                  0.3.17          pthreads_h4748800_1    conda-forge
 openjdk                   8.0.282              h7f98852_0    conda-forge
 openjpeg                  2.4.0                hb52868f_1    conda-forge
 openslide                 3.4.1                h978ee9a_4    conda-forge
 openssl                   1.1.1k               h7f98852_0    conda-forge
 orc                       1.6.9                h58a87f1_0    conda-forge
 packaging                 21.0               pyhd8ed1ab_0    conda-forge
 pandas                    1.2.5            py38h1abd341_0    conda-forge
 pandoc                    2.14.1               h7f98852_0    conda-forge
 pandocfilters             1.4.2                      py_1    conda-forge
 panel                     0.12.0             pyhd8ed1ab_0    conda-forge
 param                     1.11.1             pyh6c4a22f_0    conda-forge
 parquet-cpp               1.5.1                         2    conda-forge
 parso                     0.7.1              pyh9f0ad1d_0    conda-forge
 partd                     1.2.0              pyhd8ed1ab_0    conda-forge
 patsy                     0.5.1                      py_0    conda-forge
 pcre                      8.45                 h9c3ff4c_0    conda-forge
 pcre2                     10.37                h032f7d1_0    conda-forge
 perl                      5.32.1          0_h7f98852_perl5    conda-forge
 pexpect                   4.8.0              pyh9f0ad1d_2    conda-forge
 pickleshare               0.7.5                   py_1003    conda-forge
 pillow                    8.3.1            py38h8e6f84c_0    conda-forge
 pip                       21.2.2             pyhd8ed1ab_0    conda-forge
 pixman                    0.40.0               h36c2ea0_0    conda-forge
 pluggy                    0.13.1           py38h578d9bd_4    conda-forge
 pooch                     1.4.0              pyhd8ed1ab_0    conda-forge
 poppler                   21.03.0              h93df280_0    conda-forge
 poppler-data              0.4.10                        0    conda-forge
 postgresql                13.3                 h2510834_0    conda-forge
 proj                      8.0.1                h277dcde_0    conda-forge
 prometheus_client         0.11.0             pyhd8ed1ab_0    conda-forge
 prompt-toolkit            3.0.19             pyha770c72_0    conda-forge
 protobuf                  3.16.0           py38h709712a_0    conda-forge
 psutil                    5.8.0            py38h497a2fe_1    conda-forge
 pthread-stubs             0.4               h36c2ea0_1001    conda-forge
 ptyprocess                0.7.0              pyhd3deb0d_0    conda-forge
 py                        1.10.0             pyhd3deb0d_0    conda-forge
 py-xgboost                1.4.2dev.rapidsai21.08  cuda11.0py38_0    rapidsai-nightly
 pyarrow                   4.0.1           py38hdd2221d_6_cuda    conda-forge
 pycparser                 2.20               pyh9f0ad1d_2    conda-forge
 pyct                      0.4.6                      py_0    conda-forge
 pyct-core                 0.4.6                      py_0    conda-forge
 pydeck                    0.5.0              pyh9f0ad1d_0    conda-forge
 pygments                  2.9.0              pyhd8ed1ab_0    conda-forge
 pyhive                    0.6.4              pyhd8ed1ab_0    conda-forge
 pynndescent               0.5.4              pyh6c4a22f_0    conda-forge
 pynvml                    11.0.0             pyhd8ed1ab_0    conda-forge
 pyopenssl                 20.0.1             pyhd8ed1ab_0    conda-forge
 pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
 pyproj                    3.1.0            py38h03a1999_3    conda-forge
 pyrsistent                0.17.3           py38h497a2fe_2    conda-forge
 pysocks                   1.7.1            py38h578d9bd_3    conda-forge
 pytest                    6.2.4            py38h578d9bd_0    conda-forge
 python                    3.8.10          h49503c6_1_cpython    conda-forge
 python-confluent-kafka    1.6.0            py38h497a2fe_1    conda-forge
 python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
 python_abi                3.8                      2_cp38    conda-forge
 pytz                      2021.1             pyhd8ed1ab_0    conda-forge
 pyviz_comms               2.1.0              pyhd8ed1ab_0    conda-forge
 pywavelets                1.1.1            py38h5c078b8_3    conda-forge
 pyyaml                    5.4.1            py38h497a2fe_0    conda-forge
 pyzmq                     22.1.0           py38h2035c66_0    conda-forge
 rapids                    21.08.00a210702 cuda11.0_py38_g2d7ee9d_10    rapidsai-nightly
 rapids-blazing            21.08.00a210804 cuda11.0_py38_gc65d08b_55    rapidsai-nightly
 rapids-xgboost            21.08.00a210804 cuda11.0_py38_gc65d08b_55    rapidsai-nightly
 re2                       2021.06.01           h9c3ff4c_0    conda-forge
 readline                  8.1                  h46c0cb4_0    conda-forge
 requests                  2.26.0             pyhd8ed1ab_0    conda-forge
 requests-unixsocket       0.2.0                      py_0    conda-forge
 rmm                       21.08.00a210802 cuda_11.0_py38_gb79fdae_41    rapidsai-nightly
 rtree                     0.9.7            py38h02d302b_2    conda-forge
 s2n                       1.0.10               h9b69904_0    conda-forge
 s3fs                      2021.7.0           pyhd8ed1ab_0    conda-forge
 sasl                      0.3.1            py38h709712a_0    conda-forge
 scikit-image              0.18.2           py38h1abd341_0    conda-forge
 scikit-learn              0.23.1           py38h3a94b23_0    conda-forge
 scipy                     1.6.0            py38hb2138dd_0    conda-forge
 seaborn                   0.11.1               hd8ed1ab_1    conda-forge
 seaborn-base              0.11.1             pyhd8ed1ab_1    conda-forge
 send2trash                1.7.1              pyhd8ed1ab_0    conda-forge
 setuptools                49.6.0           py38h578d9bd_3    conda-forge
 shapely                   1.7.1            py38haeee4fe_5    conda-forge
 simpervisor               0.4                pyhd8ed1ab_0    conda-forge
 six                       1.16.0             pyh6c4a22f_0    conda-forge
 snappy                    1.1.8                he1b5a44_3    conda-forge
 sniffio                   1.2.0            py38h578d9bd_1    conda-forge
 sortedcontainers          2.4.0              pyhd8ed1ab_0    conda-forge
 spdlog                    1.8.5                h4bd325d_0    conda-forge
 sqlalchemy                1.4.22           py38h497a2fe_0    conda-forge
 sqlite                    3.36.0               h9cd32fc_0    conda-forge
 statsmodels               0.12.2           py38h5c078b8_0    conda-forge
 streamz                   0.6.2              pyh44b312d_0    conda-forge
 tbb                       2020.2               h4bd325d_4    conda-forge
 tblib                     1.7.0              pyhd8ed1ab_0    conda-forge
 terminado                 0.10.1           py38h578d9bd_0    conda-forge
 testpath                  0.5.0              pyhd8ed1ab_0    conda-forge
 threadpoolctl             2.2.0              pyh8a188c0_0    conda-forge
 thrift                    0.13.0           py38h709712a_2    conda-forge
 thrift_sasl               0.3.0           py38h1e0a361_1002    conda-forge
 tifffile                  2021.7.30          pyhd8ed1ab_0    conda-forge
 tiledb                    2.3.2                he87e0bf_0    conda-forge
 tk                        8.6.10               h21135ba_1    conda-forge
 toml                      0.10.2             pyhd8ed1ab_0    conda-forge
 toolz                     0.11.1                     py_0    conda-forge
 tornado                   6.1              py38h497a2fe_1    conda-forge
 tqdm                      4.62.0             pyhd8ed1ab_0    conda-forge
 traitlets                 5.0.5                      py_0    conda-forge
 treelite                  2.0.0            py38hc9ad5e7_0    conda-forge
 treelite-runtime          2.0.0                    pypi_0    pypi
 typing-extensions         3.10.0.0             hd8ed1ab_0    conda-forge
 typing_extensions         3.10.0.0           pyha770c72_0    conda-forge
 tzcode                    2021a                h7f98852_2    conda-forge
 tzdata                    2021a                he74cb21_1    conda-forge
 ucx                       1.9.0+gcd9efd3       cuda11.0_0    rapidsai-nightly
 ucx-proc                  1.0.0                       gpu    rapidsai-nightly
 ucx-py                    0.21.0a210803   py38_gcd9efd3_37    rapidsai-nightly
 umap-learn                0.5.1            py38h578d9bd_1    conda-forge
 urllib3                   1.26.6             pyhd8ed1ab_0    conda-forge
 wcwidth                   0.2.5              pyh9f0ad1d_2    conda-forge
 webencodings              0.5.1                      py_1    conda-forge
 websocket-client          0.57.0           py38h578d9bd_4    conda-forge
 wheel                     0.36.2             pyhd3deb0d_0    conda-forge
 widgetsnbextension        3.5.1            py38h578d9bd_4    conda-forge
 wrapt                     1.12.1           py38h497a2fe_3    conda-forge
 xarray                    0.19.0             pyhd8ed1ab_1    conda-forge
 xerces-c                  3.2.3                h9d8b166_2    conda-forge
 xgboost                   1.4.2dev.rapidsai21.08  cuda11.0py38_0    rapidsai-nightly
 xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
 xorg-libice               1.0.10               h7f98852_0    conda-forge
 xorg-libsm                1.2.3             hd9c2040_1000    conda-forge
 xorg-libx11               1.7.2                h7f98852_0    conda-forge
 xorg-libxau               1.0.9                h7f98852_0    conda-forge
 xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
 xorg-libxext              1.3.4                h7f98852_1    conda-forge
 xorg-libxrender           0.9.10            h7f98852_1003    conda-forge
 xorg-renderproto          0.11.1            h7f98852_1002    conda-forge
 xorg-xextproto            7.3.0             h7f98852_1002    conda-forge
 xorg-xproto               7.0.31            h7f98852_1007    conda-forge
 xz                        5.2.5                h516909a_1    conda-forge
 yaml                      0.2.5                h516909a_0    conda-forge
 yarl                      1.6.3            py38h497a2fe_2    conda-forge
 zeromq                    4.3.4                h9c3ff4c_0    conda-forge
 zfp                       0.5.5                h9c3ff4c_5    conda-forge
 zict                      2.0.0                      py_0    conda-forge
 zipp                      3.5.0              pyhd8ed1ab_0    conda-forge
 zlib                      1.2.11            h516909a_1010    conda-forge
 zstd                      1.5.0                ha95c52a_0    conda-forge

@efajardo-nv efajardo-nv added Needs Triage Need team to review and classify bug Something isn't working labels Aug 5, 2021
@shwina shwina added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Aug 9, 2021
@galipremsagar galipremsagar added cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code. and removed Python Affects Python cuDF API. labels Aug 10, 2021
@charlesbluca
Copy link
Member

charlesbluca commented Aug 11, 2021

It looks like specifying names somehow causes illegal memory access:

In [7]: import cudf
   ...: import pandas as pd
   ...: 
   ...: filename = 'foo.csv'
   ...: lines = [
   ...:   "num,text",
   ...:   "123,abc",
   ...:   "456,def",
   ...:   "789,ghi"
   ...: ]
   ...: 
   ...: with open(filename, 'w') as fp:
   ...:     fp.write('\n'.join(lines)+'\n')
   ...: 

In [8]: cudf.read_csv(filename, usecols=[1])
Out[8]: 
  text
0  abc
1  def
2  ghi

In [9]: cudf.read_csv(filename, usecols=[1], names=[0])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-9-0a0dfd4ac3ce> in <module>
----> 1 cudf.read_csv(filename, usecols=[1], names=[0])

~/compose/etc/conda/cuda_11.2/envs/rapids/lib/python3.8/contextlib.py in inner(*args, **kwds)
     73         def inner(*args, **kwds):
     74             with self._recreate_cm():
---> 75                 return func(*args, **kwds)
     76         return inner
     77 

~/cudf/python/cudf/cudf/io/csv.py in read_csv(filepath_or_buffer, lineterminator, quotechar, quoting, doublequote, header, mangle_dupe_cols, usecols, sep, delimiter, delim_whitespace, skipinitialspace, names, dtype, skipfooter, skiprows, dayfirst, compression, thousands, decimal, true_values, false_values, nrows, byte_range, skip_blank_lines, parse_dates, comment, na_values, keep_default_na, na_filter, prefix, index_col, **kwargs)
     68         na_values = [na_values]
     69 
---> 70     return libcudf.csv.read_csv(
     71         filepath_or_buffer,
     72         lineterminator=lineterminator,

~/cudf/python/cudf/cudf/_lib/csv.pyx in cudf._lib.csv.read_csv()
    392     cdef table_with_metadata c_result
    393     with nogil:
--> 394         c_result = move(cpp_read_csv(read_csv_options_c))
    395 
    396     meta_names = [name.decode() for name in c_result.metadata.column_names]

RuntimeError: reduce failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

While looking at this, I also noticed that it's relatively easy to get an illegal memory access by passing an out of range column index in usecols:

In [4]: cudf.read_csv(filename, usecols=[2])
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-4-97a4a25f179e> in <module>
----> 1 cudf.read_csv(filename, usecols=[2])

~/compose/etc/conda/cuda_11.2/envs/rapids/lib/python3.8/contextlib.py in inner(*args, **kwds)
     73         def inner(*args, **kwds):
     74             with self._recreate_cm():
---> 75                 return func(*args, **kwds)
     76         return inner
     77 

~/cudf/python/cudf/cudf/io/csv.py in read_csv(filepath_or_buffer, lineterminator, quotechar, quoting, doublequote, header, mangle_dupe_cols, usecols, sep, delimiter, delim_whitespace, skipinitialspace, names, dtype, skipfooter, skiprows, dayfirst, compression, thousands, decimal, true_values, false_values, nrows, byte_range, skip_blank_lines, parse_dates, comment, na_values, keep_default_na, na_filter, prefix, index_col, **kwargs)
     68         na_values = [na_values]
     69 
---> 70     return libcudf.csv.read_csv(
     71         filepath_or_buffer,
     72         lineterminator=lineterminator,

~/cudf/python/cudf/cudf/_lib/csv.pyx in cudf._lib.csv.read_csv()
    392     cdef table_with_metadata c_result
    393     with nogil:
--> 394         c_result = move(cpp_read_csv(read_csv_options_c))
    395 
    396     meta_names = [name.decode() for name in c_result.metadata.column_names]

RuntimeError: reduce failed to synchronize: cudaErrorIllegalAddress: an illegal memory access was encountered

I can open a separate issue for that.

@github-actions
Copy link

This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.

@github-actions
Copy link

This issue has been labeled inactive-30d due to no recent activity in the past 30 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be labeled inactive-90d if there is no activity in the next 60 days.

@mattf
Copy link

mattf commented Oct 21, 2022

this is a segfault in 22.10 (cc @beckernick)

$ python3.9 -m IPython
Python 3.9.14 (main, Sep  7 2022, 23:43:48) 
Type 'copyright', 'credits' or 'license' for more information
IPython 8.5.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: import io, cudf, pandas as pd

In [2]: cudf.__version__
Out[2]: '22.10.00a+392.g1558403753'

In [3]: f = lambda: io.StringIO("""
   ...: num1,datetime,text
   ...: 123,2018-11-13T12:00:00,abc
   ...: 456,2018-11-14T12:35:01,def
   ...: 789,2018-11-15T18:02:59,ghi
   ...: """)

In [4]: pd.read_csv(f(), skiprows=1, header=None, usecols=[2], names=['renamed_text_col'])
Out[4]: 
  renamed_text_col
0             text
1              abc
2              def
3              ghi

In [5]: cudf.read_csv(f(), skiprows=1, header=None, usecols=[2], names=['renamed_text_col'])
Segmentation fault (core dumped)

$ nvidia-smi 
Fri Oct 21 13:43:38 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.85.02    Driver Version: 510.85.02    CUDA Version: 11.6     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   42C    P8    14W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

@shwina
Copy link
Contributor

shwina commented Oct 21, 2022

@galipremsagar any chance you could look into this?

@galipremsagar
Copy link
Contributor

@galipremsagar any chance you could look into this?

Yup

@galipremsagar galipremsagar self-assigned this Oct 21, 2022
@vuule vuule self-assigned this Oct 26, 2022
rapids-bot bot pushed a commit that referenced this issue Nov 17, 2022
…csv` (#12018)

closes #8973
CSV reader has a few gaps in the logic for column selection and user specified column names:
1. Users cannot only specify the names of selected columns;
2. Reader fails in unpredictable ways when only a subset of column names is passed (w/o column selection);

This PR fixes the issues above. Users can now specify column names (can be lower than the actual number of columns) or names of columns selected via their indices (must match the number of indices). If selection via indices is used, the number of column names has to match either the actual number of columns, or the number of selected columns.

Also fixed test an error that went unnoticed due to issues above.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Karthikeyan (https://github.com/karthikeyann)
  - Vyas Ramasubramani (https://github.com/vyasr)
  - Nghia Truong (https://github.com/ttnghia)
  - https://github.com/nvdbaranec

URL: #12018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cuIO cuIO issue libcudf Affects libcudf (C++/CUDA) code.
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

6 participants