-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compact manifest for dcp2 catalog fails when filtered on file formats #2649
Comments
The manifest generation failing with a broad filter (all file types) appears to be caused by exceeding the 15 minute timeout configured for manifest generation. Unfortunately the logs surrounding the originally reported error are a bit hard to filter through due to multiple manifest requests being made within a short time. With further repeated tests however I was able to locate the logs of when the manifest generation was requested and when it failed and in these cases it was just over 15 minutes when the error occurred. (Note: These tests were for a "dcp1" catalog manifest because the manifest generation succeeded (failed to fail) for the "dcp2" catalog)
|
Further log combing found timeout messages for my attempts, however no such log message is found for the 2020-12-18 error in the ticket description.
|
Triage to discuss next steps. |
@hannes-ucsc :" @danielsotirhos and I are clueless as to improve the hotfix any further. The hotfix truncates large entries arbitrarily, causing data loss." |
Spike to see if this still occurs. |
Using the command provided in the description |
Prod is also able to resolve a file when following the redirects |
Search prod and dev logs for the |
The last time these messages were observed
fields @timestamp, @message
fields @timestamp, @message
|
@hannes-ucsc : "This doesn't appear to be an issue anymore." |
I don't think this should be closed while we have outstanding FIXME's for it azul/src/azul/service/manifest_service.py Line 1321 in 2036eb0
|
@hannes-ucsc : "The FIXME @noah-aviel-dove is referring to precedes a workaround for this issue, not a permanent fix. The workaround arbitrarily selects the first 100 values. We need a better solution that is not lossy." |
We truncate fields in other places, too, so maybe the workaround IS a permanent solution. We need to consolidate the thresholds of truncation (#3725), reevaluate if the solution for #3248 makes the truncation in the manifest code redundant and remove the truncation and the FIXME or just the FIXME if the truncation in the manifest is still needed. |
The request …
… fails when following the redirect.
CloudWatch Logs Insights
region: us-east-1
log-group-names: /aws/lambda/azul-service-dev-manifest
start-time: 2020-12-18T23:23:10.000Z
end-time: 2020-12-18T23:37:18.000Z
query-string:
Traceback (most recent call last):
File "/var/task/azul/service/manifest_service.py", line 279, in _generate_manifest
base_name = generator.write_to(text_buffer)
File "/var/task/azul/service/manifest_service.py", line 739, in write_to
for hit in self._create_request().scan():
File "/opt/python/elasticsearch_dsl/search.py", line 723, in scan
for hit in scan(
File "/opt/python/elasticsearch/helpers/actions.py", line 443, in scan
resp = client.scroll(
File "/opt/python/elasticsearch/client/utils.py", line 84, in _wrapped
return func(*args, params=params, **kwargs)
File "/opt/python/elasticsearch/client/__init__.py", line 1373, in scroll
return self.transport.perform_request(
File "/opt/python/elasticsearch/transport.py", line 351, in perform_request
status, headers_response, data = connection.perform_request(
File "/opt/python/elasticsearch/connection/http_requests.py", line 161, in perform_request
self._raise_error(response.status_code, raw_data)
File "/opt/python/elasticsearch/connection/base.py", line 229, in _raise_error
raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
elasticsearch.exceptions.NotFoundError: NotFoundError(404, 'search_phase_execution_exception', 'No search context found for id [100290778]')
The text was updated successfully, but these errors were encountered: