Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

shadow-delta-sample can be crashed with an existing shadow #52

Closed
larsonmpdx opened this issue Nov 14, 2017 · 17 comments
Closed

shadow-delta-sample can be crashed with an existing shadow #52

larsonmpdx opened this issue Nov 14, 2017 · 17 comments

Comments

@larsonmpdx
Copy link
Contributor

add this line

                std::this_thread::sleep_for(std::chrono::seconds(10));

after this line in shadow-delta-sample: https://github.com/aws/aws-iot-device-sdk-cpp/blob/master/samples/ShadowDelta/ShadowDelta.cpp#L197

this causes the sample to download the existing shadow without trying to modify it first. set this shadow online (the "welcome" tag gets put in there by the gui shadow editor):

{
  "desired": {
    "welcome": "aws-iot",
    "nested": {
      "1": "1",
      "2": "2"
    }
  },
  "reported": {
    "welcome": "aws-iot"
  }
}

the sample will run and crash this way when it downloads the shadow a few seconds after adding the subscription (before the sleep() call):

shadow-delta-sample: /home/user/aws-iot-device-sdk-cpp/build/third_party/rapidjson/src/include/rapidjson/document.h:1111: rapidjson::GenericValue<Encoding, Allocator>& rapidjson::GenericValue<Encoding, Allocator>::operator[](const rapidjson::GenericValue<Encoding, SourceAllocator>&) [with SourceAllocator = rapidjson::MemoryPoolAllocator<>; Encoding = rapidjson::UTF8<>; Allocator = rapidjson::MemoryPoolAllocator<>]: Assertion `false' failed.
Aborted
@larsonmpdx
Copy link
Contributor Author

I should add, this is a minimal reproducer I created after encountering this crash in my own program. I don't think it's a problem specific to shadow-delta-sample

@vareddy-zz
Copy link
Contributor

Hi @larsonmpdx ,
Sorry for the delay, we are investigating the issue and will update this thread after we find a fix.
Thanks!
Varun

@larsonmpdx
Copy link
Contributor Author

Thanks Varun!

@vareddy-zz
Copy link
Contributor

Hi @larsonmpdx ,
I tried reproducing the issue, by updating the shadow to the one you posted and adding the delay but I am unable to. In my testing the sample just updates the cloud version of the shadow and continues to perform shadow updates as usual. Is there a certain sequence of events that needs to be followed?
Thanks!
Varun

@larsonmpdx
Copy link
Contributor Author

larsonmpdx commented Nov 16, 2017

I did some more testing, this only happens when I have a lambda function enabled which writes a connected status to this device's shadow reported state based on the aws iot lifecycle event. it only happens sometimes (so it's some kind of race). the sequence is:

click delete shadow in the gui
create shadow document (gui button)
edit (gui button)
paste the "nested" bit inside the desired object in the shadow. on save this gets a delta calculated

then run shadow-delta-sample with the lambda function active. it will write a connected status back to the shadow reported state. sometimes this will cause a shadow-delta-sample crash

I can try to create a simple lambda function to help you reproduce this, would that help?

@larsonmpdx
Copy link
Contributor Author

I changed shadow-delta-sample to debug loglevel and found the error happens after this line:
https://github.com/aws/aws-iot-device-sdk-cpp/blob/master/src/shadow/Shadow.cpp#L235

I added a print statement after that line to see the shadow document:

AWS_LOG_DEBUG(SHADOW_LOG_TAG, "shadow : %s", util::JsonParser::ToString(cur_server_state_document_).c_str());

it printed this before crashing (it got an empty shadow somehow?)

:322 [Shadow] [140275671164672] [HandleUpdateResponse:L235] : Delta received for shadow : 4947bd70-c3fd-11e7-810d-a1ddbf03fe89
[DEBUG] Wed Nov 15 18:27:34 2017
:323 [Shadow] [140275671164672] [HandleUpdateResponse:L236] : shadow : {}
shadow-delta-sample: /home/user/aws-iot-device-sdk-cpp/build/third_party/rapidjson/src/include/rapidjson/document.h:1111: rapidjson::GenericValue<Encoding, Allocator>& rapidjson::GenericValue<Encoding, Allocator>::operator[](const rapidjson::GenericValue<Encoding, SourceAllocator>&) [with SourceAllocator = rapidjson::MemoryPoolAllocator<>; Encoding = rapidjson::UTF8<>; Allocator = rapidjson::MemoryPoolAllocator<>]: Assertion `false' failed.

@vareddy-zz
Copy link
Contributor

Hi @larsonmpdx ,
Yes, a simple lambda should help speed up the process of debugging.
I added a couple of logs to see if the cloud shadow gets downloaded at the beginning but it only receives an empty shadow, but it doesn't cause the rapidjson allocator to crash. I'll need to dig into it more.
Thanks!
Varun

@larsonmpdx
Copy link
Contributor Author

the problem feels like it's coming from an API call that sets the shadow (desired or reported) but doesn't trigger the generation of the delta state. I don't know how this all works but that's what I see happen in the gui when the crash happens. but it's an intermittent race thing so I might be fooling myself

@larsonmpdx
Copy link
Contributor Author

here is a node 6.10 lambda. I didn't have it actually process any data, it is just a convenient way to run the code. You can run it with the "test" button in the lambda gui. Fill in your own endpoint and thing name.

var AWS = require('aws-sdk');

exports.handler = (event, context, callback) => {

    var iotData = new AWS.IotData({
        endpoint: "???.iot.us-west-2.amazonaws.com"
    });

    var updateShadowParams = {
        thingName: "???",
        payload: JSON.stringify({state: { reported: {1: "1"}}})
    };

    iotData.updateThingShadow(updateShadowParams, (err, data) => {
        if (err) {
            return callback('failed to update the device shadow: ' + err + '\nshadow request: ' + JSON.stringify(updateShadowParams, null, 2), null);
        } else {
            return callback(null, 'updated the device shadow');
        }
    });
}

steps to reproduce:
start with no shadow (delete it in the aws iot gui)
connect shadow-delta-sample. I changed the sleep time to 10 minutes to make this easier
run the lambda function, this should set a shadow with only a "reported" key
go to the aws iot gui shadow editor and add this "desired" block:

"desired":
{
    "nested": {
      "1": "1",
      "2": "2"
    }
},

this should crash shadow-delta-sample some portion of the time (about half the time for me). Here is my debug output of the most recent crash:

:709 [Shadow] [140329657886464] [HandleUpdateResponse:L235] : Delta received for shadow : 4947bd70-c3fd-11e7-810d-a1ddbf03fe89
[DEBUG] Thu Nov 16 08:30:00 2017
:709 [Shadow] [140329657886464] [HandleUpdateResponse:L236] : shadow : {"state":{"reported":{"1":"1"}},"metadata":{"reported":{"1":{"timestamp":1510849794}}},"version":576,"timestamp":1510849794}
shadow-delta-sample: /home/user/aws-iot-device-sdk-cpp/build/third_party/rapidjson/src/include/rapidjson/document.h:1111: rapidjson::GenericValue<Encoding, Allocator>& rapidjson::GenericValue<Encoding, Allocator>::operator[](const rapidjson::GenericValue<Encoding, SourceAllocator>&) [with SourceAllocator = rapidjson::MemoryPoolAllocator<>; Encoding = rapidjson::UTF8<>; Allocator = rapidjson::MemoryPoolAllocator<>]: Assertion `false' failed.

note there is no "desired" key in my payload, the SDK is looking for this key at this line:

https://github.com/aws/aws-iot-device-sdk-cpp/blob/master/src/shadow/Shadow.cpp#L237

@vareddy-zz
Copy link
Contributor

Hi @larsonmpdx ,
I am able to reproduce the crash. I will update this thread after I make some headway with the debugging.
Thanks for all the help!
Varun

@vareddy-zz
Copy link
Contributor

Hi @larsonmpdx ,
I have a simple fix you can try with your tests to see if it crashes. After https://github.com/aws/aws-iot-device-sdk-cpp/blob/master/src/shadow/Shadow.cpp#L236, please add the following code block:

if (!cur_server_state_document_.HasMember(SHADOW_DOCUMENT_STATE_KEY)) {
                            util::JsonParser::InitializeFromJsonString(cur_server_state_document_,
                                                                       SHADOW_DOCUMENT_EMPTY_STRING);
                        }

Let me know if that works for you.
Thanks!
Varun

@larsonmpdx
Copy link
Contributor Author

the problem is trying to process JSON blobs assuming there are some keys there. line 237 assumes these keys:

[SHADOW_DOCUMENT_STATE_KEY][SHADOW_DOCUMENT_DESIRED_KEY]

if the state key is missing your fix will replace it. I posted debug output with this JSON:

{"state":{"reported":{"1":"1"}},"metadata":{"reported":{"1":{"timestamp":1510849794}}},"version":576,"timestamp":1510849794}

The state key exists but the desired key does not, so your fix won't be triggered. I can still reproduce this crash. In general, anywhere in the SDK that you depend on JSON from the internet being formatted a certain way you need to test for the formatting first. Looking around the code there are other spots that need these tests as well.

@vareddy-zz
Copy link
Contributor

vareddy-zz commented Nov 16, 2017

Hi @larsonmpdx ,
Copying the empty shadow document into the current server state will ensure that the state, desired and reported keys exist. Does your server state have only the reported key but not the desired key at any stage? Or does your SHADOW_DOCUMENT_EMPTY_STRING template which you use for your program not have the desired state?
You are correct, I am currently going through the Shadow code to add error handling to wherever a key is being accessed.
Thanks!
Varun

@larsonmpdx
Copy link
Contributor Author

I'm using shadow-delta-sample unaltered except to add the sleep() call and some debug printing. SHADOW_DOCUMENT_EMPTY_STRING is unaltered

you can see my debug printing shows the JSON delivered from the server, and in the given crash log it has a state key but no desired key within it. this happens sometimes when following the reproduction steps with the lambda function setting a reported state and adding a desired state by hand with the gui editor. the test cur_server_state_document_.HasMember(SHADOW_DOCUMENT_STATE_KEY) would return true in this case and lead to a crash when SHADOW_DOCUMENT_DESIRED_KEY isn't found

@vareddy-zz
Copy link
Contributor

Hi @larsonmpdx ,
I made another change which you can test, could you let me know if this works. After line https://github.com/aws/aws-iot-device-sdk-cpp/blob/master/src/shadow/Shadow.cpp#L236, add the following:

                        if (!cur_server_state_document_.HasMember(SHADOW_DOCUMENT_STATE_KEY)) {
                            util::JsonParser::InitializeFromJsonString(cur_server_state_document_,
                                                                       SHADOW_DOCUMENT_EMPTY_STRING);
                        }
                        util::JsonDocument desired_check_doc;
                        util::JsonParser::InitializeFromJsonString(desired_check_doc,
                                                                   util::JsonParser::ToString(cur_server_state_document_[SHADOW_DOCUMENT_STATE_KEY]));
                        if (!desired_check_doc.HasMember(SHADOW_DOCUMENT_DESIRED_KEY)) {
                            util::JsonDocument empty_doc;
                            util::JsonParser::InitializeFromJsonString(empty_doc, SHADOW_DOCUMENT_EMPTY_STRING);
                            util::JsonParser::MergeValues(cur_server_state_document_[SHADOW_DOCUMENT_STATE_KEY],
                                                          empty_doc[SHADOW_DOCUMENT_STATE_KEY],
                                                          cur_server_state_document_.GetAllocator());
                        }

The above change takes into consideration that the device might not have a desired state and merges the empty document with the cur_server_state_document_ before merging with the delta. This ensures it has desired and reported parts of the state.
Thanks!
Varun

@larsonmpdx
Copy link
Contributor Author

yes, that fixes it, thank you

you can simplify the desired key check to this line and get rid of desired_check_doc:

if (!cur_server_state_document_[SHADOW_DOCUMENT_STATE_KEY].HasMember(SHADOW_DOCUMENT_DESIRED_KEY)) {

@vareddy-zz
Copy link
Contributor

Hi @larsonmpdx ,
Great! I'll make the above changes and run it through our internal test pipeline and then push the fix to master. I'll keep this issue open till then.
Thanks!
Varun

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants