Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DecodeErrors using pyarrow flight connector #941

Open
Maxsparrow opened this issue Dec 19, 2023 · 0 comments
Open

DecodeErrors using pyarrow flight connector #941

Maxsparrow opened this issue Dec 19, 2023 · 0 comments
Labels
bug Something isn't working

Comments

@Maxsparrow
Copy link

Describe the bug
Various errors occur when trying to get flight info with pyarrow Flight connector against a Ballista deployment using latest code.

Query1:

create external table sample stored as CSV with header row location '/mnt/sample.csv';

Error1 after calling get_flight_info:

ArrowInvalid: Flight returned invalid argument error, with message: DecodeError { description: "buffer underflow", stack: [("Any", "type_url")] }

Query2:

select 'Hello from Arrow Ballista!';

Error2:

ArrowInvalid: Flight returned invalid argument error, with message: DecodeError { description: "unexpected end group tag", stack: [] }

I also tried with the arrow-ballista-python repo, installing its latest code, and I'm unable to connect:

In [3]: ctx = ballista.BallistaContext(hostname, client_port)
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
Cell In[3], line 1
----> 1 ctx = ballista.BallistaContext(hostname, client_port)

Exception: Ballista error: DataFusionError(Execution("Status { code: Internal, message: \"Error parsing request\", metadata: MetadataMap { headers: {\"content-type\": \"application/grpc\", \"date\": \"Tue, 19 Dec 2023 21:27:04 GMT\", \"content-length\": \"0\"} }, source: None }"))

To Reproduce
Steps to reproduce the behavior:

  • Deploy Ballista scheduler and executors using latest code, built from the repo off commit 934b32f
  • Install latest pyarrow 14.0.2 in a Python 3.10 environment

Run against your service:

client = flight.FlightClient(f'grpc://{hostname}:{port}')
client.authenticate_basic_token("admin", "password")
query = "select 'Hello from Arrow Ballista!';"
descriptor = flight.FlightDescriptor.for_command(query)
info = client.get_flight_info(descriptor)
# Errors here

Expected behavior
No error and return flight info object.

Additional context
I deployed Ballista in Kubernetes, so it could still be a networking or setup issue. The Ballista scheduler and executor logs seem to suggest they started up correctly though, and there are no errors. The Ballista UI for my deployment also works, and the 'client.authenticate_basic_token' call works in Python, which suggests the server is running correctly and I can connect to it somehow.

I'm new to Rust and the whole DataFusion ecosystem, so I'm not aware if there's an easier way to test if my deployment is working. Any advice would be appreciated.

@Maxsparrow Maxsparrow added the bug Something isn't working label Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant