Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with loading converted onnx model #1229

Closed
antonfr opened this issue Jun 14, 2019 · 31 comments
Closed

Problem with loading converted onnx model #1229

antonfr opened this issue Jun 14, 2019 · 31 comments
Assignees

Comments

@antonfr
Copy link

antonfr commented Jun 14, 2019

Describe the bug
I have converted to onnx ssdlite_mobilenet_v2_coco model from tensorflow detection model zoo (could be found here). Now I'm trying to load the model using ML.NET and get an error

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): macOS 10.13.6
  • ONNX Runtime installed from (source or binary): NuGet
  • ONNX Runtime version: 0.4.0
  • Python version: 3.6.7
  • Visual Studio version (if applicable): 8.0.5
  • GCC/Compiler version (if compiling from source): none
  • CUDA/cuDNN version: none
  • GPU model and memory: none

To Reproduce
` public struct ImageSettings
{
public const int ImageWidth = 300;
public const int ImageHeight = 300;
public const bool ChannelLast = true;
}

    public struct SSDSettings
    {
        public const string SSDInput = "image_tensor:0";
        public const string SSDDetectionsOutput = "num_detections:0";
        public const string SSDClassesOutput = "detection_classes:0";
        public const string SSDBoxesOutput = "detection_boxes:0";
        public const string SSDScoresOutput = "detection_scores:0";
    }

    public PredictionEngine<ImageData, ImagePredictions> LoadModel(string modelLocation,
                                                                   string imagesLocation,
                                                                   string tagsLocation)
    {
        IDataView data = mLContext.Data.LoadFromTextFile<ImageData>(path: tagsLocation,
                                                                    hasHeader: false);
        var pipeline = mLContext.Transforms.LoadImages(outputColumnName: "image_tensor:0",
                                                       imageFolder: imagesLocation,
                                                       inputColumnName: nameof(ImageData.ImagePath))
               .Append(mLContext.Transforms.ResizeImages(outputColumnName: "image_tensor:0",
                                                         imageWidth: ImageSettings.ImageWidth,
                                                         imageHeight: ImageSettings.ImageHeight,
                                                         inputColumnName: "image_tensor:0"))
               .Append(mLContext.Transforms.ExtractPixels(outputColumnName: "image_tensor:0",
                                                          interleavePixelColors: ImageSettings.ChannelLast))
               .Append(mLContext.Transforms.ApplyOnnxModel(modelFile: modelLocation,
                                                           outputColumnNames: new[] { SSDSettings.SSDDetectionsOutput,
                                                                                      SSDSettings.SSDClassesOutput,
                                                                                      SSDSettings.SSDBoxesOutput,
                                                                                      SSDSettings.SSDScoresOutput },
                                                           inputColumnNames: new[] { SSDSettings.SSDInput }));


        var model = pipeline.Fit(data);

        var predictionEngine = mLContext.Model.CreatePredictionEngine<ImageData, ImagePredictions>(model);
        return predictionEngine;
    }`

Expected behavior
Model should be loaded correctly

Screenshots

Additional context
With OnnxRuntime 0.4.0 I got
2019-06-12 14:32:53.802528 [W:onnxruntime:InferenceSession, session_state_initializer.cc:502 SaveInputOutputNamesToNodeMapping] Graph input with name i__19 is not associated with a node. 2019-06-12 14:32:53.802581 [W:onnxruntime:InferenceSession, session_state_initializer.cc:502 SaveInputOutputNamesToNodeMapping] Graph input with name cond__21 is not associated with a node. 2019-06-12 14:32:54.004760 [W:onnxruntime:InferenceSession, session_state_initializer.cc:502 SaveInputOutputNamesToNodeMapping] Graph input with name i__42 is not associated with a node. 2019-06-12 14:32:54.004790 [W:onnxruntime:InferenceSession, session_state_initializer.cc:502 SaveInputOutputNamesToNodeMapping] Graph input with name cond__44 is not associated with a node. 2019-06-12 14:32:54.005072 [W:onnxruntime:InferenceSession, session_state_initializer.cc:502 SaveInputOutputNamesToNodeMapping] Graph input with name i is not associated with a node. Onnx type not supported

With earlier versions I got
Error initializing model :Microsoft.ML.OnnxRuntime.OnnxRuntimeException: [ErrorCode:InvalidGraph] Load model from /my/path/to/file/ssd_mobilenet.onnx failed:Node:Preprocessor/map/strided_slice Node (Preprocessor/map/strided_slice) has input size 4 not in range [min=1, max=1]. at Microsoft.ML.OnnxRuntime.InferenceSession..ctor(String modelPath, SessionOptions options) in C:\agent\_work\6\s\csharp\src\Microsoft.ML.OnnxRuntime\InferenceSession.cs:line 83 at Microsoft.ML.OnnxRuntime.InferenceSession..ctor(String modelPath) in C:\agent\_work\6\s\csharp\src\Microsoft.ML.OnnxRuntime\InferenceSession.cs:line 31 at Microsoft.ML.Transforms.Onnx.OnnxModel..ctor(String modelFile, Nullable 1 gpuDeviceId, Boolean fallbackToCpu) at Microsoft.ML.Transforms.Onnx.OnnxTransformer..ctor(IHostEnvironment env, Options options, Byte[] modelBytes)

@faxu faxu added the bug label Jun 14, 2019
@pranavsharma
Copy link
Contributor

Based on this error - Graph input with name i__19 is not associated with a node. - it looks like the converted model doesn't have any input by the name 'i__19'. You should take a look at the inputs in the onnx graph and use them for inference. Can you upload the converted onnx model here?

@hariharans29 hariharans29 self-assigned this Jun 17, 2019
@hariharans29
Copy link
Member

hariharans29 commented Jun 18, 2019

Just curious @antonfr - did you try an inferencing run with 0.4.0 ?

From the below, I see a bunch of warnings (not errors), so I wonder if the model actually loaded correctly.

Additional context With OnnxRuntime 0.4.0 I got 2019-06-12 14:32:53.802528 [W:onnxruntime:InferenceSession, session_state_initializer.cc:502 SaveInputOutputNamesToNodeMapping] Graph input with name i__19 is not associated with a node. 2019-06-12 14:32:53.802581 [W:onnxruntime:InferenceSession, session_state_initializer.cc:502 SaveInputOutputNamesToNodeMapping] Graph input with name cond__21 is not associated with a node. 2019-06-12 14:32:54.004760 [W:onnxruntime:InferenceSession, session_state_initializer.cc:502 SaveInputOutputNamesToNodeMapping] Graph input with name i__42 is not associated with a node. 2019-06-12 14:32:54.004790 [W:onnxruntime:InferenceSession, session_state_initializer.cc:502 SaveInputOutputNamesToNodeMapping] Graph input with name cond__44 is not associated with a node. 2019-06-12 14:32:54.005072 [W:onnxruntime:InferenceSession, session_state_initializer.cc:502 SaveInputOutputNamesToNodeMapping] Graph input with name i is not associated with a node. Onnx type not supported

As for the noisy "warnings", this should be addressed by #1235. Sometimes there could be some residual superfluous information in the converted model and the runtime might just be complaining about this. This should not affect model loading and inference run itself.

@antonfr
Copy link
Author

antonfr commented Jun 18, 2019

@pranavsharma here is my model
ssd_mobilenet.onnx.zip
@hariharans29 I opened the same issue in tensorflow-onnx repository, @guschmue writes, that he successfully converted ssd mobilenet model, and he provided link to his repository, nevertheless, I had the same problem with his model, but with different graph input names.

@hariharans29
Copy link
Member

hariharans29 commented Jun 18, 2019

Hi @antonfr,

Thanks for sharing the model. I think the model loads fine. I hit issues while performing the inference run. Here are some noteworthy points -

  1. The model seems to have symbolic dimensions-

image

So I followed the lead from your earlier snapshot where width = height = 300 and fed it random numpy data of type uint8 of shape [1, 300, 300, 3]

  1. it crashes at a Gather node -

image

So either the conversion had an issue or the input shape is still not right. if the input shape is not right, please correct the shape below and give it a shot. If after correcting it, the model still doesn't run, it might be a conversion issue. So you can follow up with the converter tool owners.

This is the python script I used to load and invoke the model -

import onnxruntime as rt
import numpy as np

sess = rt.InferenceSession("ssd_mobilenet.onnx")
print("Done loading model")

input = np.ndarray(shape=(1, 300, 300, 3), dtype='uint8')
input_name = sess.get_inputs()[0].name

pred_onnx = sess.run(None, {input_name: input})[0]

@skottmckay
Copy link
Contributor

This will possibly be fixed by #1233 that just got checked in.

There are NonZero nodes earlier in the graph that provide an iteration count to a Loop node. If there are no matches the iteration count is zero. The shape of some of the Loop outputs wasn't correct in that case, leading to Gather breaking later on. This occurring is dependent on the input though, so it won't necessarily crash every time.

Longer term it would be nicer if the model had a shortcut path when NonZero returns no matches, but that's a question for the converter team as to whether that's achievable.

@hariharans29
Copy link
Member

Thanks @skottmckay. I built a python wheel including #1233 and the model didn't crash and finished its run successfully.

@antonfr - could you try building from source and checking if the results look okay (I only validated that the crash was resolved, still need to validate results) ?

@antonfr
Copy link
Author

antonfr commented Jun 19, 2019

@hariharans29 Unfortunately I met problems with installing onnxruntime from source, though I have followed instructions:

  1. Checkout the source tree: - done
  2. Install cmake-3.13 or better from https://cmake.org/download/. - done
  3. (optional) Install protobuf 3.6.1 from source code - met a problem, used brew install protobuf instead
  4. (optional) Install onnx from source code (cmake/external/onnx) - got error error: package directory 'onnxruntime/backend' does not exist
  5. Run ./build.sh --config RelWithDebInfo --build_wheel for Linux (or build.bat --config RelWithDebInfo --build_wheel for Windows). Upon successful build you should be able to find the wheel under dist folder. - got subprocess.CalledProcessError: Command '['/usr/local/Cellar/cmake/3.14.4/bin/cmake', '--build', '/Users/anton/onnxruntime/build/Linux/RelWithDebInfo', '--config', 'RelWithDebInfo']' returned non-zero exit status 2.

@hariharans29
Copy link
Member

@snnn and @pranavsharma - any idea what's the issue ?

@pranavsharma
Copy link
Contributor

I don't usually run the optional steps. It should work without that.

@hariharans29
Copy link
Member

@antonfr - what's the exact build error you get ?

@pranavsharma
Copy link
Contributor

I just built this and it works just fine. The only change I had to make was to comment out the running of onnx_backend_test_series.py inside build.py. I ran it like this: ./build.sh --config RelWithDebInfo --build_wheel --parallel. I'm at this commit 23838d9.

@antonfr
Copy link
Author

antonfr commented Jun 20, 2019

@hariharans29 here is the full log in zip archive
log_build.zip

@pranavsharma tried to use your solution, unfortunately, result is the same.

@antonfr
Copy link
Author

antonfr commented Jun 20, 2019

Additionally, when I'm trying python3 setup.py bdist_wheel, I get error: package directory 'onnxruntime/backend' does not exist
I asked my colleague to try building from source on his mac, he got the same problems as me.

@hariharans29
Copy link
Member

Are you using clang compiler on Linux ?
clang: error: linker command failed with exit code 1 (use -v to see invocation)

I don't think this is supported according to the OS/Compiler support matrix here - https://github.com/microsoft/onnxruntime/blob/master/BUILD.md

(@pranavsharma can correct me if I am wrong)

@antonfr
Copy link
Author

antonfr commented Jun 24, 2019

@hariharans29 yes, I'm using clang compiler on macOS, any ideas, if I could build onnxruntime from source and if yes, how could I do it?

@hariharans29
Copy link
Member

Yes, the steps to build onnxruntime from source are documented here - https://github.com/microsoft/onnxruntime/blob/master/BUILD.md. Can you please check if you are missing something from the steps ?

@antonfr
Copy link
Author

antonfr commented Jun 24, 2019

@HariharanS I have described in details all steps above #1229 (comment)

@hariharans29
Copy link
Member

Hi @antonfr,

This must be a local dev environment issue and probably not a build issue itself as Mac builds are run on a daily basis (and a per PR basis) and things look fine.

However, we will try building on a Mac and get back to you.

Thanks

@antonfr
Copy link
Author

antonfr commented Jun 27, 2019

Thanks @hariharans29, waiting for results. From my side, I can provide you all information about my environment, that you might consider necessary.

@jignparm
Copy link
Contributor

jignparm commented Jun 27, 2019

@antonfr -- can you do the following steps, and skip the optional steps listed above? The optional steps may be confusing the issue.

1> remove any installation of protobuf
2> start off with a clean Python installation (packages may include libprotobuf, which could interfere with the build). So uninstall Anaconda if you have it installed.

  • git clone https://github.com/microsoft/onnxruntime
  • cd onnxruntime
  • git submodule init
  • git submodule update --recursive
  • <Edited >Open the ./build.sh script, and remove the --use_openmp flag.
  • ./build.sh --config RelWithDebInfo --build_wheel

Can you update the thread if the steps above fail?

@hariharans29
Copy link
Member

Thanks @jignparm

Also, the errors in your logs are similar to the one raised midway of #648 and the resolution was try building first without --use_openmp. I notice that in your build logs, -Donnxruntime_USE_OPENMP=ON' did exist indicating that you were building with OpenMP. Can you try building without OpenMP please ?

@snnn
Copy link
Member

snnn commented Jun 28, 2019

BTW, We'll remove the onnxruntime_USE_OPENMP option and keep it always off.

@antonfr
Copy link
Author

antonfr commented Jul 1, 2019

@jignparm thanks a lot, remove --use_openmp and comment out onnx_backend_test_series.py works fine!

@antonfr
Copy link
Author

antonfr commented Jul 1, 2019

One more question, how to add built from source onnxruntime to existing project in Visual Studio? I use project -> add nuget packages ->configure sources -> add. What folder should I select?

@jignparm
Copy link
Contributor

jignparm commented Jul 1, 2019

To add a "built from source onnxruntime" to an existing project in Visual Studio, you need to first generate a NuGet package, which includes the runtimes you need (.dll, .so or .dylib files). If you build locally from master branch on a Windows operating system, you'll only get .dll files for Windows. If that is good enough for you, then you can run msbuild command below to generate a .nupkg file.

The NuGet package includes the C# assemblies, so you need to build the C# projects as well as the native C++ projecst. Simply add the --build_csharp flag (e.g. "./build.sh --config RelWithDebInfo --build_wheel --build_csharp" ) to the build command.

This creates the NuGet package from source that you can add to Visual Studio.

cd onnxruntime\csharp
"C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\MSBuild\15.0\Bin\msbuild" OnnxRuntime.CSharp.proj /p:Configuration=Debug /t:CreatePackage /p:PackageId=Microsoft.ML.OnnxRuntime

The package on NuGet.org contains runtimes for all three operating systems (Windows, Linux Ubuntu flavor, and MacOS), in case you need a package that needs to run in multiple environments.

@antonfr
Copy link
Author

antonfr commented Jul 2, 2019

@jignparm with --build_csharp flag I got following errors (see log_file for full log)
log_build2.zip
Though without this flag everything works fine. Any other ways to create onnxruntime nugget package on macOS?

@jignparm
Copy link
Contributor

jignparm commented Jul 2, 2019

From the log files, it seems like some Mono header files are getting pulled into the build, even for the Native C++ library build, when you use the --build_csharp flag on MacOS. The C# dlls are usually compiled on Windows in our build systems, since they are cross-platform.

/Library/Frameworks/Mono.framework/Headers/png.h:597:3: note: 'png_time' declared here
} png_time;

Since you are able to build the native dylib library successfully without the --build_csharp flag, another option is to build the C# project independently. Drop the --build_csharp flag from the command line, and build the C# project using dotnet build OnnxRuntime.CSharp.sln. It relies on the OnnxruntimeBuildDirectory environment variable here to point to the root directory of the native C++ build. Set that variable to point to the root of the build directory before running dotnet build command.

A second (simpler?) option is to use a pre-existing NuGet package, rename it with .zip extension, unzip the contents, replace the native library at runtimes/osx-64/native/libonnxruntime.dylib (and optionally the C# library at lib/netstandard1.1/Microsoft.ML.OnnxRuntime.dll in case there are changes in the C# code base). You should be able to re-zip these files into a NuGet package for debugging/development purposes.

@antonfr
Copy link
Author

antonfr commented Jul 2, 2019

Thanks for your reply @jignparm. As for the first option, path to my onnxruntime library is /Users/anton Is it the root to my build directory?
As for the second option, though I got 2019-07-02 11:06:03,836 Build [INFO] - Build complete message, both libonnxruntime.dylib and Microsoft.ML.OnnxRuntime.dll aren't available.

@jignparm
Copy link
Contributor

jignparm commented Jul 2, 2019

To generate a dylib file, add the --build_shared flag to the build script. It'll generate the dylib in the location below. The build configuration in this case is RelWithDebInfo, but other options could be Debug or Release.

The build directory root, in the example path below, would be /onnxruntime/build. If building C# independently, set OnnxRuntimeBuildDirectory to this value before starting the dotnet build.

Example path to dylib file:

/onnxruntime/build/Linux/RelWithDebInfo/libonnxruntime.dylib

@jignparm
Copy link
Contributor

jignparm commented Jul 9, 2019

@antonfr -- are you able to build from source and load the model successfully now, as @hariharans29 was able to do?

@hariharans29
Copy link
Member

Closing this for now. @antonfr - Please reopen in case you have more issues / require further clarifications. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants