Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear docs regarding parser_callback_t callbacks #1972

Closed
laomaiweng opened this issue Mar 4, 2020 · 8 comments · Fixed by #2153
Closed

Unclear docs regarding parser_callback_t callbacks #1972

laomaiweng opened this issue Mar 4, 2020 · 8 comments · Fixed by #2153
Assignees
Labels
confirmed kind: bug release item: 🐛 bug fix solution: proposed fix a fix for the issue has been proposed and waits for confirmation
Milestone

Comments

@laomaiweng
Copy link

laomaiweng commented Mar 4, 2020

I'm trying to use the SAX parser with event callbacks to drop certain items when loading a JSON file.
It's not clear to me what event callbacks are/should be called when parsing values inside an array.

Consider this (ignoring all depths > 1):

[                        // array_start, depth=0
  1,                     // value, depth=1
  {                      // object_start, depth=1
    "foo": "bar"
  }                      // object_end, depth=1
]                        // array_end, depth=0

The docs don't show such an example, but the comments above indicate what I observe. Is it normal that no callback is fired with parse event value and depth 1 at the end of the object in the array? This feels a bit awkward because I need to check both value and object_end events at depth 1 inside my array if I want to process all array items. Perhaps the docs could be clearer on this point?

Also, the doc indicates that "Discarded values in structured types are skipped". However if during my object_end, depth 1 callback I return false, the array item is merely tagged as discarded (is_discarded() == true) even after parsing. This also seems to contradict the docs for is_discarded(): "This function will always be false for JSON values after parsing".
This seems to be due to the following check in json_sax_dom_callback_parser::end_object():

    if (not ref_stack.empty() and ref_stack.back() and ref_stack.back()->is_object())

To match the docs and also drop items from arrays, shouldn't this be:

    if (not ref_stack.empty() and ref_stack.back() and ref_stack.back()->is_structured())

It tried this and it seems to be working fine, but I didn't check for other impacts this change may have.

FWIW, I'm using json.hpp v3.7.3 on Linux.

Cheers and thanks for the awesome library. :)

@laomaiweng
Copy link
Author

(I haven't check whether/how this could/should also apply to value events.)

@dota17
Copy link
Contributor

dota17 commented Mar 5, 2020

Hi @laomaiweng if you want to learn more about sax_parse, please this document,
and as to the using of event callback in here, I changed the example code like:

 #include <iostream>
 #include <iomanip>
 #include <nlohmann/json.hpp>
  
 using json = nlohmann::json;
  
 int main()
 {
     // a JSON text
     auto text = R"(
        [
            1,
            {
                "foo": "bar"
            }  
        ]
     )";
  
     // define parser callback
     json::parser_callback_t cb = [](int depth, json::parse_event_t event, json & parsed)
     {
         if (event == json::parse_event_t::object_start)   // ignoring { "foo" : "bar"}
         {
             return false;
         }
         else
         {
             return true;
         }
     };
  
     // parse (with callback) and serialize JSON
     json j_filtered = json::parse(text, cb);
     std::cout << std::setw(4) << j_filtered << '\n';
 }

The output filtered { "foo" : "bar"} successful:

[
    1
]

According to

    using parser_callback_t =
        std::function<bool(int depth, parse_event_t event, BasicJsonType& parsed)>;`

you can also use depth and parsed to control your output, like

     // define parser callback
     json::parser_callback_t cb = [](int depth, json::parse_event_t event, json & parsed)
     {
         // skip object elements with key "Thumbnail"
         if (depth > 1)
         {
             return false;
         }
         else
         {
             return true;
         }
     };

the output is:

[
    1,
    {}
]

Hope these are helpful to you.

@laomaiweng
Copy link
Author

Thank you very much for your reply. :)

Your example works and mine doesn't because you discard the {"foo":"bar"} object at the object_start event. I would like to have a chance to examine the object before I decide to discard it, so I must wait for the object_end event.

Using your above code with the following callback:

    // define parser callback
    json::parser_callback_t cb = [](int depth, json::parse_event_t event, json &parsed) {
        // skip object containing key "foo"
        if (event == json::parse_event_t::object_end && parsed.contains("foo"))
            return false;
        else
            return true;
    };

the output is:

[
    1,
    <discarded>
]

Is the <discarded> item expected? Can I actually discard the object (at object_end) without the <discarded> item still appearing in the array afterwards?

@dota17
Copy link
Contributor

dota17 commented Mar 6, 2020

In my understanding, it can't discard the <discard> item, I've made some tests to print the log:

// if (event == json::parse_event_t::object_start)`
if (event == json::parse_event_t::object_end)   //
{
     std::cout << "parsed: " << parsed << std::endl;
     std::cout << parsed.is_discarded() << std::endl;
     return false;
}

when it comes to object_start, the parsed is <discarded>, is_discarded() == true,
and when comes to object_end, the parsed is {"foo":"bar"}, is_discarded() == false
the parsed value is not tagged as discarded, but the callback function still return false, so the parse() function will return value_t::discarded type value <discarded>. @nlohmann Am I right?

@stale

This comment has been minimized.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Apr 5, 2020
@laomaiweng
Copy link
Author

If at all possible, I'd very much like to know if it's possible to discard an object at object_end, without it merely being tagged as discarded in the resulting JSON. ^^"

@stale stale bot removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label Apr 5, 2020
@stale

This comment has been minimized.

@stale stale bot added the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label May 6, 2020
@nlohmann nlohmann removed the state: stale the issue has not been updated in a while and will be closed automatically soon unless it is updated label May 10, 2020
@nlohmann
Copy link
Owner

I finally had time to check this issue.

Indeed, parsing

[
  1,
  {
    "foo": "bar"
  }
]

gives the follwing callbacks:

event=array_start   depth=0  parsed=<discarded>
event=value         depth=1  parsed=1
event=object_start  depth=1  parsed=<discarded>
event=key           depth=2  parsed="foo"
event=value         depth=2  parsed="bar"
event=object_end    depth=1  parsed={"foo":"bar"}
event=array_end     depth=0  parsed=[1,{"foo":"bar"}]

And indeed there is a callback of type value missing after the object was parsed.

I can also confirm that parsing the text above with the callback

json::parser_callback_t cb = [](int depth, json::parse_event_t event, json & parsed)
{
    if (event == json::parse_event_t::object_end)
    {
        return false;
    }
    return true;
};

yields this result:

[1, <discarded>]

The logics of the callback were never really challenged, and moving it into the SAX parser was quite challenging. I will try to understand your proposed fix and whether it has further implications.

@dota17 dota17 mentioned this issue May 30, 2020
@nlohmann nlohmann linked a pull request Jun 3, 2020 that will close this issue
@nlohmann nlohmann added the solution: proposed fix a fix for the issue has been proposed and waits for confirmation label Jun 3, 2020
@nlohmann nlohmann self-assigned this Jun 3, 2020
@nlohmann nlohmann added this to the Release 3.8.0 milestone Jun 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed kind: bug release item: 🐛 bug fix solution: proposed fix a fix for the issue has been proposed and waits for confirmation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants