Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Received server error (500) from primary and could not load the entire response body from endpoint #1331

Open
zinebtabet opened this issue Jul 10, 2023 · 4 comments
Labels
bug Something isn't working needs-triage Triage required

Comments

@zinebtabet
Copy link

[ERROR] ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from primary and could not load the entire response body. See https://eu-west-3.console.aws.amazon.com/cloudwatch/home?region=eu-west-3#logEventViewer:group=/aws/sagemaker/Endpoints/pytorch-inference-2023-07-10-09-51-02-299 in account 086892845792 for more information. Traceback (most recent call last): File "/var/task/lambda_function.py", line 442, in lambda_handler pred_prob = invoke_endpoint_with_idx(endpointname = ENDPOINT_NAME, target_id = transaction_id, subgraph_dict = subgraph_dict, n_feats = transaction_embed_value_dict) File "/var/task/lambda_function.py", line 314, in invoke_endpoint_with_idx response = runtime.invoke_endpoint(EndpointName=endpointname, File "/opt/python/lib/python3.8/site-packages/botocore/client.py", line 508, in _api_call return self._make_api_call(operation_name, kwargs) File "/opt/python/lib/python3.8/site-packages/botocore/client.py", line 911, in _make_api_call raise error_class(parsed_response, operation_name) enter image description here please i got this error while running the following code https://github.com/awslabs/realtime-fraud-detection-with-gnn-on-dgl/tree/main/src/sagemaker.

@zinebtabet zinebtabet added bug Something isn't working needs-triage Triage required labels Jul 10, 2023
@zxkane
Copy link
Contributor

zxkane commented Jul 11, 2023

@zinebtabet Could you add the detailed reproduciable steps how you using the code?

@zinebtabet
Copy link
Author

zinebtabet commented Jul 11, 2023

I used the same code as you have in the SageMaker repository. The only thing I modified was the Docker file since I am in EU West 3. I set it up like this: ARG IMAGE_REPO=763104351884.dkr.ecr.eu-west-3.amazonaws.com FROM $IMAGE_REPO/pytorch-training:1.11.0-cpu-py38-ubuntu20.04-sagemaker ENV PATH="/opt/ml/model:${PATH}" ENV SAGEMAKER_SUBMIT_DIRECTORY /opt/ml/model COPY * /opt/ml/code/ ENV SAGEMAKER_PROGRAM fd_sl_train_entry_point.py RUN pip install dgl dglgo -f https://data.dgl.ai/wheels/repo.html

I used the same version I specified in the Docker file for the deployment as well. Then I invoked my endpoint with the Lambda function. Once I executed the test event, I received this error. I have had this error for over a month now. I will provide screenshots of the error:

image

the error in the lambda test event: Traceback (most recent call last): File "/var/task/lambda_function.py", line 442, in lambda_handler pred_prob = invoke_endpoint_with_idx(endpointname = ENDPOINT_NAME, target_id = transaction_id, subgraph_dict = subgraph_dict, n_feats = transaction_embed_value_dict) File "/var/task/lambda_function.py", line 314, in invoke_endpoint_with_idx response = runtime.invoke_endpoint(EndpointName=endpointname, File "/opt/python/lib/python3.8/site-packages/botocore/client.py", line 508, in _api_call return self._make_api_call(operation_name, kwargs) File "/opt/python/lib/python3.8/site-packages/botocore/client.py", line 911, in _make_api_call raise error_class(parsed_response, operation_name)

@aminHelkinz
Copy link

Hello @zxkane,

Thank you for the nice project. I learn a lot from it.

We use the SageMaker notebook & studio to reproduce the project. The model was created and repackaged successfully and the endpoints of them work well. Suddenly, (in a middle of a demo) the endpoint didn't respond.

Right now, we have only one model that has a workable endpoint which is trained and repackaged with SageMaker notebook.

From then none of our endpoints (the models created with notebook or studio) does not work anymore.

I appreciate any help or suggestion!

@zxkane
Copy link
Contributor

zxkane commented Aug 2, 2023

@zhjwy9343 is the data scentist for authoring those Notebook. James, could you have a look at those problems?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-triage Triage required
Projects
None yet
Development

No branches or pull requests

3 participants