Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix parsing number from ISIS3 array #34

Closed
wants to merge 6 commits into from

Conversation

seignovert
Copy link

I notice a small bug during the parsing of numbers from ISIS3 array.

Here is a sample of a Casini/VIMS cube header produced by ISIS3:

Object = IsisCube
  Group = BandBin
    Center         = (0.89261,0.909067,0.925522,0.941972,0.95841,0.974852,0.9-
                      91281,1.00771,1.02413,1.04055,1.05696,1.07335,1.08976,1.-
                      10616,1.12257,1.13896,1.15535,1.17174,1.18813,1.20451,1.-
                      2209,1.23728,1.25367,1.27005,1.28642,1.3028,1.31919,1.33-
                      557,1.35196,1.36834,1.38472,1.4011,1.4175,1.43389,1.4502-
                      9,1.46669,1.48309,1.49949,1.51591,1.53233,1.54875,1.5651-
                      8,1.58161,1.59805,1.61452,1.63094,1.6474,1.66386,1.68033-
                      ,1.69679,1.71327,1.72974,1.74622,1.7627,1.77918,1.79566,-
                      1.81214,1.82862,1.8451,1.86158,1.87806,1.89453,1.91101,1-
                      .92748,1.94395,1.96041,1.97688,1.99334,2.00981,2.02628,2-
                      .04276,2.05924,2.07574,2.09223,2.10874,2.12527,2.1418,2.-
                      15835,2.17492,2.1915,2.20811,2.22472,2.24135,2.25798,2.2-
                      7462,2.29125,2.30789,2.32452,2.34114,2.35776,2.37437,2.3-
                      9096,2.40755,2.42414,2.44072,2.45729,2.47387,2.49044,2.5-
                      0701,2.52358,2.54015,2.55673,2.5733,2.58988,2.60647,2.62-
                      307,2.63966,2.65626,2.67287,2.6895,2.70612,2.72274,2.739-
                      4,2.75603,2.77267,2.78933,2.80599,2.82266,2.83932,2.8559-
                      9,2.87266,2.88934,2.90601,2.9227,2.93938,2.95606,2.97274-
                      ,2.98941,3.00617,3.02283,3.03952,3.05624,3.07293,3.08963-
                      ,3.10632,3.12301,3.13972,3.15642,3.17312,3.18981,3.20651-
                      ,3.2232,3.2399,3.25658,3.27327,3.28996,3.30664,3.32331,3-
                      .33999,3.35666,3.37332,3.38998,3.40665,3.42331,3.43997,3-
                      .45663,3.47329,3.48994,3.5066,3.52325,3.5399,3.55656,3.5-
                      7321,3.58986)
  End_Group
End_Object
End

Most of the array is parsed corrected but some values remain Unicode strings (with a - symbol at the end), e.g.:

...
u'1.68033-',
...
u'2.97274-',
...
u'3.08963-',
...
u'3.20651-',
...

It always happen when the next line starts with a ,.

I guess there is a nicer way to solve that problem, but here is dirty patch to fix it quickly.

@wtolson
Copy link
Member

wtolson commented Nov 21, 2017

Nice catch @seignovert, thanks for the PR. It looks like this implementation would parse the continuation on the next line as its own number instead of including it with the previous line. It would be good to get a test to show the intended behaviour. For example:

foo = (1.234,1.12-
       34,1.234)

should be parsed as

{'foo': [1.234, 1.234, 1.234]}

as opposed to

{'foo': [1.234, 1.2, 34, 1.234]}

@wtolson
Copy link
Member

wtolson commented Nov 22, 2017

Thanks @seignovert, the test looks great. I did a little digging on this and it brought up some questions on how line continuations should work. It seems any time there is a dash and a new line not in a quoted string, the line should be continued. In which case

# Should parse as:
pvl.loads("foo = bar-\n") == {"foo": "bar"}

# Should NOT parse as:
pvl.loads("foo = bar-\n") != {"foo": "bar-"}

@godber @pbvarga1 Any perspective on how the other libraries handle this?

We currently have a test that enforces keeping the trailing dash that would need to change if so. It seems fixing this would only be moving the line where the dash is striped up above checking for an unquoted string.

@godber
Copy link
Member

godber commented Dec 9, 2017

Thanks @wtolson and @seignovert for working on this. I've not encountered this myself, but perhaps it's the kind of thing @thareUSGS knows about.

@thareUSGS
Copy link

thareUSGS commented Dec 9, 2017

Good question. Based on the comments in the linked .cpp code, it appears a dash at the end of a line, even if quoted, is considered "special", meaning it will always be a continuation flag.

see if this helps: https://github.com/USGS-Astrogeology/ISIS3/blob/e7590fb1d49292edeb7661d681e05f76409a316a/isis/src/base/objs/PvlKeyword/PvlKeyword.cpp#L660

extracted example:

       For example, the following two QStrings are the same:
               "The planet Jupiter is very big" and
                "The planet Jupi-
                 ter is very big"

@seignovert
Copy link
Author

Hi all,
Sorry for the delay to update the PR and thanks for your replies.

Based on your comments I simply deleted my original edit and I removed the has_quoted_string test in the parse_unquoated_string function.
I also added two new tests to handle the continuation symbol in strings and everything seems to work now:

Hope this will solve the problem globally.

@michaelaye
Copy link
Member

Is this ready to merge, @wtolson ?

@rbeyer
Copy link
Member

rbeyer commented Feb 27, 2020

Obsoleted by #38, and @seignovert added to authors file. This PR can be withdrawn.

@AndrewAnnex
Copy link
Member

@seignovert closing this pr, please check out the code in current master and see if it works for you!

@seignovert
Copy link
Author

Thanks @rbeyer and @AndrewAnnex.
I check with the version 1.0.0a0 and it solve the issue.

seignovert added a commit to seignovert/pyvims that referenced this pull request Apr 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants