Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error, message length too large: found 7666438 bytes, the limit is: 4194304 bytes #773

Closed
andygrove opened this issue May 14, 2023 · 7 comments · Fixed by #782, #928 or #931
Closed

Error, message length too large: found 7666438 bytes, the limit is: 4194304 bytes #773

andygrove opened this issue May 14, 2023 · 7 comments · Fixed by #782, #928 or #931
Labels
bug Something isn't working

Comments

@andygrove
Copy link
Member

Describe the bug

I tried running some benchmarks, but some queries fail with this error:

2023-05-14T16:00:52.679602Z  WARN tokio-runtime-worker ThreadId(47) ballista_executor::execution_loop: Executor poll work loop failed. If this continues to happen the Scheduler might be marked as dead. Error: status: OutOfRange, message: "Error, message length too large: found 7666438 bytes, the limit is: 4194304 bytes", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Sun, 14 May 2023 16:00:52 GMT"} }    

To Reproduce

Start cluster:

./target/release/ballista-scheduler
./target/release/ballista-executor -c 24

Run TPC-H benchmarks

Expected behavior
Should not fail

Additional context

@andygrove andygrove added the bug Something isn't working label May 14, 2023
@yahoNanJing
Copy link
Contributor

yahoNanJing commented May 26, 2023

Hi @andygrove, we also meet the same issue. I will propose a PR to add a config to make the maximum decoded message size configurable for temporary fix.

@andygrove
Copy link
Member Author

I am still running into this error with the latest code.

2023-12-11T14:31:18.347839Z  WARN          task_runner ThreadId(82) ballista_executor::cpu_bound_executor: Spawned task output ignored: receiver dropped    
2023-12-11T14:31:18.484649Z  WARN tokio-runtime-worker ThreadId(45) ballista_executor::execution_loop: Executor poll work loop failed. If this continues to happen the Scheduler might be marked as dead. Error: status: OutOfRange, message: "Error, message length too large: found 7700152 bytes, the limit is: 4194304 bytes", details: [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Mon, 11 Dec 2023 14:31:18 GMT"} }    

I am using the default --grpc-server-max-decoding-message-size size of 16 MB, but the limit still appears to be 4 MB.

@andygrove
Copy link
Member Author

We currently set the decoding max size but not the encoding max size, so perhaps that is the issue. I will test this.

@Dandandan
Copy link
Contributor

Dandandan commented Dec 11, 2023

We've hit some other errors related to max sizes at our end (Coralogix), we reduced those errors by:

  • increasing max size limits (for execution plan / flight message (batches))
  • reducing batch size (batches can be really big at Coralogix because of containing long strings)
  • enabling flight compression

@Dandandan
Copy link
Contributor

Dandandan commented Dec 11, 2023

Some other things we did:

  • we have an optimization rule that will remove unused partitions (for the task) before sending it to the executor PruneUnusedPartitions, as our plans can contain 1000s of partitions.
  • AFAIK one thing we can also do and don't do yet is enabling (gzip) compression on the GRPC API to reduce the size of sending the execution plan.

@andygrove
Copy link
Member Author

I confirmed that setting the max encoding size resolves the issue for me.

@andygrove
Copy link
Member Author

We set max encode/decode message size when creating the gRPC servers, but not for the clients, so I ran into this again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment