Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Sometimes pueued does not respond to the pueue client #541

Open
Shihira opened this issue Jun 12, 2024 · 12 comments
Open

[Bug] Sometimes pueued does not respond to the pueue client #541

Shihira opened this issue Jun 12, 2024 · 12 comments
Labels
f: Help wanted o: Windows A Windows OS exclusive issue t: Bug

Comments

@Shihira
Copy link

Shihira commented Jun 12, 2024

Describe the bug

Sometimes pueued does not respond to the pueue status. That is to say the pueue client hangs forever waiting for the daemon. Commands like pueue add, pueue clean continues to work, especially that pueue clean can sometimes get things recovered. I guess this was because the message from the daemon were too long or be truncated?

Steps to reproduce

No 100% reproducible, usually happens when several tasks were added in the same time.

Debug logs (if relevant)

16:16:01 [INFO] Parsing config files
16:16:01 [INFO] Checking path: "C:\\Users\\clairfeng\\AppData\\Roaming\\pueue\\pueue.yml"
16:16:01 [INFO] Found config file at: "C:\\Users\\clairfeng\\AppData\\Roaming\\pueue\\pueue.yml"
16:16:01 [DEBUG] (1) rustls::client::hs: No cached session for DnsName("pueue.local")
16:16:01 [DEBUG] (1) rustls::client::hs: Not resuming any session
16:16:01 [DEBUG] (1) rustls::client::hs: Using ciphersuite TLS13_AES_256_GCM_SHA384
16:16:01 [DEBUG] (1) rustls::client::tls13: Not resuming
16:16:01 [DEBUG] (1) rustls::client::tls13: TLS1.3 encrypted extensions: [ServerNameAck]
16:16:01 [DEBUG] (1) rustls::client::hs: ALPN protocol is None
16:16:01 [DEBUG] (1) pueue_lib::network::protocol: Sending message: Status
16:16:01 [DEBUG] (1) rustls::server::hs: decided upon suite TLS13_AES_256_GCM_SHA384
16:16:01 [DEBUG] (2) pueue_lib::network::protocol: Received message: Status
16:16:01 [DEBUG] (2) pueue_lib::network::protocol: Sending message: StatusResponse(
...
)

Although the server had sended the message, the client never seemed to have recieved it.

Operating system

Windows 10

Pueue version

v3.4.0

Additional context

No response

@Nukesor Nukesor added t: Bug o: Windows A Windows OS exclusive issue labels Jun 12, 2024
@Nukesor
Copy link
Owner

Nukesor commented Jun 12, 2024

Phew, that could be anything.

This will need somebody with a windows machine to look into this issue :)

It would be great if anybody that also runs into this issue could take a look!

@Shihira
Copy link
Author

Shihira commented Jun 17, 2024

I printed some log when sending and receiving bytes. In my case the server sent 170334 bytes but the client received only 161280 bytes, the tail was missing for some strange reasons, especially when the connection was a localhost TCP so there should not have been any network stability problems. I changed the PACKET_SIZE from 1280 to 64K and the problem seemed to have disappeared for now, but I am not sure if this is a proper fix.

@Nukesor
Copy link
Owner

Nukesor commented Jun 17, 2024

This is odd. I didn't expect this to be a networking issue, let alone an MTU issue. Though maybe that's a red herring and the issue just doesn't appear as there're less frames that're sent.

Using large frames will lead to issues in most networks and with payloads that're bigger than 64k. We had problems with those in the past, which is why a very conservative MTU of 1280 has been chosen.

It looks like some packets are lost, but since you're using TCP that really shouldn't be a problem...

I tried to reproduce your problem and connected my local client to my server via TCP (via a wireguard transport layer) and it worked just fine. This makes it tricky for me to debug though, as I really cannot do any analysis as long as I cannot reproduce the issue :<

@Nukesor
Copy link
Owner

Nukesor commented Jun 17, 2024

One more question. What exactly do you mean by "especially that pueue clean can sometimes get things recovered."?

@Nukesor
Copy link
Owner

Nukesor commented Jun 23, 2024

Ping @Shihira

@Nukesor
Copy link
Owner

Nukesor commented Jul 14, 2024

Closed due to unresponsiveness

@Nukesor Nukesor closed this as completed Jul 14, 2024
@Shihira
Copy link
Author

Shihira commented Jul 15, 2024

One more question. What exactly do you mean by "especially that pueue clean can sometimes get things recovered."?

Sorry for my slow reply, because pueue has been working for now after adjusting the packet size.

This means pueue clean kept responding, and after the cleaning the pueue status itself was also brought back from freezing. It gave me a clue that the message length might relevant with this issue. Cleaning tasks might just cut down the message length and therefore make the transportation successful. That's why I turned to the packet size.

Could it be an option to expose the packet size so that users can adjust it to fit different system environments?

@Nukesor Nukesor reopened this Jul 15, 2024
@Nukesor
Copy link
Owner

Nukesor commented Jul 15, 2024

A bigger packet size shouldn't resolve any issues. If any, it should introduce new issues as lots of infrastructure doesn't handle it. This is most definitely a bug and needs to be fixed.

Thanks for the explanation, that indeed makes sense. I'll take another look at the net code for a bit in the coming days.

@abylon-io
Copy link

My behavior is exactly the same as that described by @Shihira.
As I can no longer track the progress of tasks, I use the log files associated with each task to find out how far they've progressed.
It's not very practical, but I like pueue so much that I can't do without it.
I'm running Windows 11 & pueue 3.4.1.

@Nukesor
Copy link
Owner

Nukesor commented Jul 16, 2024

So, if anybody is interested in debugging this for a bit, could you take a look at pueue_lib/src/network/protocol.rs.

In there are the receive_bytes and send_bytes functions, which are responsible for sending the individual packets.

I would start with adding some print meessages with general info about the message to be sent (including chunk count etc) and then print each individual packet.
That way, we should see where the packet gets lost. Maybe add a second of delay or so after each packet. That should make it rather obvious.

@undecV
Copy link

undecV commented Sep 7, 2024

I have been using Pueue for some time now, adding tasks and tracking their status through some custom scripts. From my experience, if there are too many uncleared tasks in Pueue (regardless of Group and Status, and I add quite long labels to each task), once it accumulates to several dozen tasks, my pueue status becomes unresponsive. However, tasks and the queue continue to function normally (same as this issue). This happens almost every time, but after running pueue clean and reducing the number of remaining tasks, pueue status resumes normal operation when there are fewer tasks.

I am using Windows 11 (22635.4145) and Pueue 3.4.1.

I hope this response is not too late and can still be helpful. I like Pueue so much.

@Nukesor
Copy link
Owner

Nukesor commented Sep 7, 2024

It's most certainly not too late.

It's just that this is a windows specific bug and somebody that uses Windows needs to go ahead and debug this behavior :D

The same network logic works perfectly fine for both Mac and Linux, so there's some Windows specific network logic that's causing problems. Also it seems to work just fine for many other windows users out there, so I assume it's a user environment specific issue.

As I only use Linux myself, I won't try to fix this 🙂.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
f: Help wanted o: Windows A Windows OS exclusive issue t: Bug
Projects
None yet
Development

No branches or pull requests

4 participants