-
Notifications
You must be signed in to change notification settings - Fork 113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Cannot load a exported deepfm model with NGC 22.03 inference container #125
Comments
@mengdong What functions are you using to generate config files? NVTabular |
@mengdong still having this problem in 22.03? |
Two issues - HugeCTR has resolved it . NVTabular model - type error - our plan is to laod the model in the latest container and check if the problem persists |
Yes, I was able to replicate the error in 22.03 |
we use export hugectr ensemble, it appears the hugectr model loads fine, updated the bug to reflect latest update |
This looks to me like an issue with loading information about the NVTabular workflow from the Triton config, and in particular with parsing the |
I have shared the exported model (including Triton config) on Slack, the repro is a bit complicated, let me know if you still need it. |
Since HugeCTR always expects the same three fields, we don't have to consult the `Workflow`'s output schema to determine the dtypes. We can just hard-code them. Partially addresses NVIDIA-Merlin/Merlin#125
Since HugeCTR always expects the same three fields, we don't have to consult the `Workflow`'s output schema to determine the dtypes. We can just hard-code them. Partially addresses NVIDIA-Merlin/Merlin#125
We just merged a fix for the issue that was occurring when trying to load the NVT workflow in Triton. I think there's another issue here though, which is that some configuration properties moved out of the DeepFM model's @zehuanw @minseokl Could you assign someone to this issue who's familiar with the parameter server config file and how to create one? (cc @EvenOldridge) |
Since HugeCTR backend has added more new features in the past few release, I suggest that you can manually create a ps.json file. For details, please refer to https://github.com/triton-inference-server/hugectr_backend#independent-inference-hierarchical-parameter-server-configuration |
Thanks, hugectr is not a issue for this bug. I have manually created ps.json and it worked. |
I'm going to track that part of the issue here and close this one, but I don't think a Triton ensemble creation process that requires our customers to manually create a config file and place it in the exported Triton model repo directory is very user friendly. 😕 |
@yingcanw @zehuanw @jconwayNV I'm with @karlhigley on this. We need to move away from manually creating json files as a part of our config. |
@karlhigley @EvenOldridge @zehuanw The ps.json was added manually just to verify that wether the issue was caused by missing key parameters in ps.json. I remember that ps.json was always generated manually in nvt&hugectr ensamble mode before,regardless of whether triton was generated automatically. If I understand this correctly, you can just add the logic to generate ps.json automatically here |
I would keep this bug open as NVtabular still has error when loading the model. |
and error indicates NVtabular error not hugectr. |
bug's marked done but SA wasn't able to verify the fix that was merged to main. couple of items here @karlhigley @albert17 , I am reopening this bug. Please review these comments. Thanks. |
Can you give the version you're having this issue with please @mengdong and clarify what issue you're having. You'd previously posted:
What worked? And what do you expect to work now that doesn't that you're flagging in this issue? |
Initially, the bug contains 2 errors, 1 on hugectr, 1 on nvtabular. With manually created ps.json, hugectr model worked. NVtabular error still withstand with latest NVtabular pip installation. Per the current description of the bug, the error message shows:
The environment I reproduce the bug is: nightly Merlin inference container (which I reinstall nvtabular from TOT of main branch) |
@mengdong Did you re-run the existing export with the new version of NVTabular or re-export the ensemble with the new version? You'll have to re-export the ensemble with the latest NVTabular to see a difference, since updating NVT doesn't change the code in exported Python models. If you've done that and it's still not working, let me know. |
Thanks Karl. This makes sense. Let me give it another try. |
Hello @karlhigley Here is what I have done: Error message
|
If I omit a few latest commit, and checkout the snapshot with your fix I0331 21:02:14.652909 684 server.cc:549] I0331 21:02:14.653137 684 server.cc:592] |
Using the ensemble provided in |
I tested the Criteo HugeCTR Inference Example and it worked for me |
yes please
…On Thu, Apr 28, 2022 at 14:53 viswa-nvidia ***@***.***> wrote:
@mengdong <https://github.com/mengdong> , can we close this issue ?
@sohn21c <https://github.com/sohn21c> for viz.
—
Reply to this email directly, view it on GitHub
<#125 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABLQN42LNNQ2LZDN55F5GXLVHMCHLANCNFSM5PSPJBTA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
run into following errors
Aha! Link: https://nvaiinfa.aha.io/features/MERLIN-818
The text was updated successfully, but these errors were encountered: