Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow tonumber to work on strings that contain commas as thousands separators #1102

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

cinquin
Copy link

@cinquin cinquin commented Feb 20, 2016

Before:

echo '{"test" : "1,000"}' | jq "with_entries(.value |= tonumber)"
jq: error (at <stdin>:1): Expected value before ',' at line 1, column 2 (while parsing '1,000')

After:

echo '{"test" : "1,000"}' | jq "with_entries(.value |= tonumber)"
{
  "test": 1000
}

…parators.

Note that this would need to be adapted for non-English locales.
@cinquin
Copy link
Author

cinquin commented Feb 20, 2016

The Appveyor test failure is due to a problem with the continuous integration system itself (as noted in other pull requests), and not to the changes in my pull request. The tests do pass on my machine.

@ghost
Copy link

ghost commented Feb 20, 2016

I don't really have an opinion for or against this pull request, but note that, as with your other pull request, this can be done from jq itself:

echo '{"test" : "1,000"}' | jq 'with_entries(.value |= (sub(","; "") | tonumber))'

int index_to = 0;
while (input[index_from] != 0) {
output[index_to++] = input[index_from++];
while (input[index_from] == ',') {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leading commas won't be removed. Why remove multiple contiguous commas?

@nicowilliams
Copy link
Contributor

Hi! I think it is is interesting. I'm not sure what's the best thing to do here. As @slapresta says, this can be done in jq code. In general I'd rather jq-code things than C-code them, but here it could be a performance issue. Also, if we're going to allow commas as thousands separators, how much should we care about properly formatted numbers? E.g., is 10,00 acceptable? How about 1,0, or 1,,0,0? What if someone expects 1,05 to be 1.05 (i.e., they expect internationalized number parsing)?

@cinquin
Copy link
Author

cinquin commented Feb 21, 2016

@nicowilliams You raise valid questions. My immediate goal was to fix the problem for my immediate purposes. I'll see if I can find some time to come up with a more general solution.

@slapresta Thanks for the jq version. As @nicowilliams noted there is a performance issue. Also, your solution would have to be adapted to deal with mixed input types (which would further increase complexity). For example,

echo '{"test" : "1,000", "test2" : 1000}' | jq 'with_entries(.value |= (sub(","; "") | tonumber))'
jq: error (at <stdin>:1): number (1000) cannot be matched, as it is not a string

@pkoppstein
Copy link
Contributor

In my opinion, the existing tonumber should be left as-is, partly for stability, but mainly because the existing definition matches the JSON concept of "number" exactly. Anything else it seems would run into internationalization issues.

As far as Anglo-American commas are concerned, it seems to me that it is somewhat arbitrary to remove just one of the commas. Removing commas that are in the wrong position has its disadvantages, but a reasonable generalization of tonumber would be as follows:

def to_n: if type == "string" then gsub(",";"") else . end | tonumber;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants