Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix/dbus call issue #2838

Merged
merged 2 commits into from
Jul 9, 2024
Merged

Fix/dbus call issue #2838

merged 2 commits into from
Jul 9, 2024

Conversation

YJDoc2
Copy link
Collaborator

@YJDoc2 YJDoc2 commented Jun 30, 2024

This fixes the issues with dbus method call not receiving reply. The RC was that the current implementation stopped after receiving one message, and did not continue. Thus in some cases the signal type message would be returned, and we would discard it as its not the method reply type. As we did not look for further messages, it seemed that there was no reply from dbus. Now if the sent message was of type method call, we keep looping until we get either a method reply or method error type of message. Ideally the correct way would be to keep collecting messages in a stateful buffer and return when we get a message corresponding to the serial number we sent, but it can be tricky to implement that correctly, and at least for now we do not need that.

This should still be considered WIP, I needed review so opened this, but I need to add some related shell scripts I used for future needs.

Apart from that this also has fix for when intermediate process would error and leave a broken socket, without any error. Now it properly attempts to send the error message over the socket so that main process can correctly identify and report it.

Signed-off-by: Yashodhan Joshi <yjdoc2@gmail.com>
Signed-off-by: Yashodhan Joshi <yjdoc2@gmail.com>
Comment on lines +328 to +346
// in Youki, we only ever do method call apart from initial auth
// in case it is, we don't really have a specific message to look
// out of, so we take the buffer and break
if mtype != MessageType::MethodCall {
break;
}

// check if any of the received message is method return or error type
let return_message_count = ret
.iter()
.filter(|m| {
m.preamble.mtype == MessageType::MethodReturn
|| m.preamble.mtype == MessageType::Error
})
.count();

if return_message_count > 0 {
break;
}
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like suggestion for other ways to implement this. Ideally I feel that send_message function should not have to deal with the specific message type, but as we collect the response here, we need to account for it. Other way I can think is if we separate the send message and get response functions, so that the higher level calling functions can choose to loop over the get response. But that means we would have to do two function calls at each place we are doing a send message call. Would that be a better way?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that means we would have to do two function calls at each place we are doing a send message call. Would that be a better way?

I do not think it is. The complexity of should be contained in send_message. The caller should not have to think about it.

@YJDoc2
Copy link
Collaborator Author

YJDoc2 commented Jul 4, 2024

As per the #2826 (comment) , this PR likely solves the issue, so requesting a review of the code now (not urgent).

@YJDoc2 YJDoc2 requested a review from a team July 4, 2024 06:45

let reply_rcvd = match reply_res {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Comment on lines +328 to +346
// in Youki, we only ever do method call apart from initial auth
// in case it is, we don't really have a specific message to look
// out of, so we take the buffer and break
if mtype != MessageType::MethodCall {
break;
}

// check if any of the received message is method return or error type
let return_message_count = ret
.iter()
.filter(|m| {
m.preamble.mtype == MessageType::MethodReturn
|| m.preamble.mtype == MessageType::Error
})
.count();

if return_message_count > 0 {
break;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But that means we would have to do two function calls at each place we are doing a send message call. Would that be a better way?

I do not think it is. The complexity of should be contained in send_message. The caller should not have to think about it.

// for method calls, we need to have an error or method return type message, so
// we keep looping until we get either of these. see https://github.com/containers/youki/issues/2826
// for more detailed analysis.
loop {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand why this loop is necessary. Shouldn't receive_complete_response have read everything from the socket the first time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way receive_complete_response is implemented, it loops internally to read from socket into a buffer, and if read_bytes < buffer_size, it returns the buffer. This is because dbus uses same socket for communications, and the read will block if there is no further data to read. Hence we return once the received data is less than buffer.
Unfortunately, the dbus messages can come in multiple bursts, so we can have two messages that get delivered together, or they can have few milliseconds between them, and thus one read call might only give single message < size of buffer. We need to loop here to keep consuming messages until we find one that we are looking for, which in this case is the method_return or error type.

buf = &buf[ctr..];
ret.push(msg);

// it is possible that while receiving messages, we get some extra/previous message
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know why we get these extra messages?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of now the only extra messages that are observed are of type signal, which basically broadcast some event to all connected clients. The thing is we used to get these before as well, but usually these would get consumed in buffers of some other method calls or such, and get ignored. In few cases that would not happen, which lead to the dbus error in question. To solve this, now we keep consuming all messages until we get method return or error message (in case of method call message).

@Furisto Furisto merged commit d6c5996 into containers:main Jul 9, 2024
28 checks passed
@Furisto
Copy link
Member

Furisto commented Jul 9, 2024

Thanks @YJDoc2 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants