Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to return status for realtime ingestion tasks #16641

Closed
panhongan opened this issue Jun 22, 2024 · 0 comments
Closed

Failed to return status for realtime ingestion tasks #16641

panhongan opened this issue Jun 22, 2024 · 0 comments

Comments

@panhongan
Copy link
Contributor

panhongan commented Jun 22, 2024

Realtime ingestion task failure with error message "failed to return status"

Affected Version

all versions.

Description

Please include as much detailed information about the problem as possible.

  • Cluster size
  • Configurations in use
  • Steps to reproduce the problem
  • The error message or stack traces encountered. Providing more context, such as nearby log messages or even entire logs, can be helpful.
  • Any debugging that you have already done

[Description]
There was occasional ingestion task failure, error message is "failed to return status". The related code is in SeekableStreamSupervisor.java:
if (results.get(i).isError() || results.get(i).valueOrThrow() == null) { killTask(taskId, "Task[%s] failed to return status, killing task", taskId); }

By adding som logs, found the failure is from ChatHandlerResource.java:
throw new ServiceUnavailableException(StringUtils.format("Can't find chatHandler for handler[%s]", handlerId));

[Analysis]
By analyzing the overlord log and ingestion task log, found that Jetty server looks started and the http endpoint is ready to be serving, but the ingestion task is still in starting up and the ChatHandler was not found.

When the chatHandler will be registered? Get the answer in SeekableStreamIndexTaskRunner::runInternal():
toolbox.getChatHandlerProvider().register(task.getId(), this, false);

[Solution]
If the ChatHandlerResource can wait for the ingestion task startup finished, the http service will be safe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant