-
Notifications
You must be signed in to change notification settings - Fork 262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature request - fixed length ascii string data type #132
Comments
I'm responding so you know this isn't being ignored; I'm going to need to discuss the history behind this with Russ and Dennis, to get some context regarding why we haven't implemented fixed-length strings yet. It could simply have been a resource issue, or there could have been something else. I guess I need to find out what I don't know. I will follow up once I have further info. |
Commenting here to know it's still alive; we're still resource constrained, especially with AGU around the corner. But this has come up in the last week as an issue as well. |
I guess on thing to do is to tightly define what would be proposed.
Comments? |
Additional note |
for numpy fixed-length string arrays, the 'length' means bytes for ASCII strings and characters for unicode. |
I think you can set the size of a string data type in HDF5 with H5Tset_size e.g. http://stackoverflow.com/questions/29528674/how-to-write-fixed-length-strings-in-hdf5 This may only work for ASCII though. I think this is what h5py does for their fixed length string datatype (http://docs.h5py.org/en/latest/strings.html) - no fixed length unicode data type is supported. |
Restricting to ascii is IMO out of the question because Also this seems odd:
|
The unicode encoding for python is configurable - its ASCII by default in python 2, and I believe it's UTF-8 in python 3. |
So my speculation is that a utf-8 fixed string in python |
Current situation: |
I guess a fixed-length unicode data type only really makes sense for UTF-32. I think numpy represents unicode data internally with UTF-32 (UCS4). I don't think HDF5 supports UTF-32 though (sigh). I still think having an ASCII fixed-length string array datatype would be very useful though. |
As I said, give us a detailed proposal including use case(s) |
I agree with Denis that there should be no fixed length ascii type in netCDF. Also it is not clear to me why the user cannot use a fixed array of NC_CHAR for this. |
Of course you can, but it's a convenience thing. I think this ticket can be closed. |
I know I've asked for this before, but it seems like every couple of months a python user requests this feature, and I have to tell them that the netcdf C library doesn't support it.
In python/numpy, there is a fixed-length ascii string data type (type
S#
, e.g.S10
for 10-character ascii strings). Fortran has this too, withcharacter(len=10)
. In order to store these arrays in a netcdf file, they have to either be converted to arrays of characters, or variable length strings. VLEN strings don't map nicely on to numpy or fortran arrays. I'm pretty sure HDF5 has a fixed-length ascii string data type, which is used by h5py (http://docs.h5py.org/en/latest/strings.html). This would map directly onto numpy and fortran string arrays, and I think would be a very popular feature if added to netcdf-c.The text was updated successfully, but these errors were encountered: