Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Torchserve not starting for diffusers example #3345

Open
dummyuser-123 opened this issue Oct 10, 2024 · 9 comments
Open

Torchserve not starting for diffusers example #3345

dummyuser-123 opened this issue Oct 10, 2024 · 9 comments

Comments

@dummyuser-123
Copy link

🐛 Describe the bug

I was running diffusers using torchserve through this tutorial given in the readme file. But I am not able to start the torchserve after correctly following all the instruction properly.

Error logs

(env) D:\Text-to-Image\only cartoon torchserve\diffusers>torchserve --start --ts-config config.properties --disable-token-auth --enable-model-api

(env) D:\Text-to-Image\only cartoon torchserve\diffusers>WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2024-10-10T11:31:05,290 [DEBUG] main org.pytorch.serve.util.ConfigManager - xpu-smi not available or failed: Cannot run program "xpu-smi": CreateProcess error=2, The system cannot find the file specified
2024-10-10T11:31:05,290 [WARN ] main org.pytorch.serve.util.ConfigManager - Your torchserve instance can access any URL to load models. When deploying to production, make sure to limit the set of allowed_urls in config.properties
2024-10-10T11:31:05,321 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2024-10-10T11:31:05,352 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from D:\Text-to-Image\only cartoon torchserve\env\Lib\site-packages/ts/configs/metrics.yaml
2024-10-10T11:31:05,446 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.12.0
TS Home: D:\Text-to-Image\only cartoon torchserve\env\Lib\site-packages
Current directory: D:\Text-to-Image\only cartoon torchserve\diffusers
Temp directory: C:\Users\Win\AppData\Local\Temp
Metrics config path: D:\Text-to-Image\only cartoon torchserve\env\Lib\site-packages/ts/configs/metrics.yaml
Number of GPUs: 1
Number of CPUs: 12
Max heap size: 4056 M
Python executable: D:\Text-to-Image\only cartoon torchserve\env\Scripts\python.exe
Config file: config.properties
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store: D:\Text-to-Image\only cartoon torchserve\diffusers
Initial Models: all
Log dir: D:\Text-to-Image\only cartoon torchserve\diffusers\logs
Metrics dir: D:\Text-to-Image\only cartoon torchserve\diffusers\logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 655350000
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.|http(s)?://.]
Custom python dependency for model allowed: true
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: true
Workflow Store: D:\Text-to-Image\only cartoon torchserve\diffusers
CPP log config: N/A
Model config: N/A
System metrics command: default
Model API enabled: true
2024-10-10T11:31:05,462 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Loading snapshot serializer plugin...
2024-10-10T11:31:05,462 [DEBUG] main org.pytorch.serve.ModelServer - Loading models from model store: Diffusion_model
2024-10-10T11:31:05,483 [INFO ] main org.pytorch.serve.archive.model.ModelArchive - createTempDir C:\Users\Win\AppData\Local\Temp\models\9075562fcbd8442ab843e59a5b06db52
2024-10-10T11:31:05,483 [WARN ] main org.pytorch.serve.ModelServer - Failed to load model: D:\Text-to-Image\only cartoon torchserve\diffusers\Diffusion_model
java.nio.file.FileSystemException: C:\Users\Win\AppData\Local\Temp\models\9075562fcbd8442ab843e59a5b06db52\model: A required privilege is not held by the client
at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) ~[?:?]
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) ~[?:?]
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:108) ~[?:?]
at sun.nio.fs.WindowsFileSystemProvider.createSymbolicLink(WindowsFileSystemProvider.java:604) ~[?:?]
at java.nio.file.Files.createSymbolicLink(Files.java:1070) ~[?:?]
at org.pytorch.serve.archive.utils.ZipUtils.createSymbolicDir(ZipUtils.java:159) ~[model-server.jar:?]
at org.pytorch.serve.archive.model.ModelArchive.downloadModel(ModelArchive.java:94) ~[model-server.jar:?]
at org.pytorch.serve.wlm.ModelManager.createModelArchive(ModelManager.java:185) ~[model-server.jar:?]
at org.pytorch.serve.wlm.ModelManager.registerModel(ModelManager.java:143) ~[model-server.jar:?]
at org.pytorch.serve.wlm.ModelManager.registerModel(ModelManager.java:74) ~[model-server.jar:?]
at org.pytorch.serve.ModelServer.initModelStore(ModelServer.java:205) [model-server.jar:?]
at org.pytorch.serve.ModelServer.startRESTserver(ModelServer.java:399) [model-server.jar:?]
at org.pytorch.serve.ModelServer.startAndWait(ModelServer.java:124) [model-server.jar:?]
at org.pytorch.serve.ModelServer.main(ModelServer.java:105) [model-server.jar:?]
2024-10-10T11:31:05,483 [DEBUG] main org.pytorch.serve.ModelServer - Loading models from model store: logs
2024-10-10T11:31:05,483 [INFO ] main org.pytorch.serve.archive.model.ModelArchive - createTempDir C:\Users\Win\AppData\Local\Temp\models\42efbe4c41a846ab82c229b237ce0d97
2024-10-10T11:31:05,483 [WARN ] main org.pytorch.serve.ModelServer - Failed to load model: D:\Text-to-Image\only cartoon torchserve\diffusers\logs
java.nio.file.FileSystemException: C:\Users\Win\AppData\Local\Temp\models\42efbe4c41a846ab82c229b237ce0d97\logs: A required privilege is not held by the client
at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) ~[?:?]
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) ~[?:?]
at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:108) ~[?:?]
at sun.nio.fs.WindowsFileSystemProvider.createSymbolicLink(WindowsFileSystemProvider.java:604) ~[?:?]
at java.nio.file.Files.createSymbolicLink(Files.java:1070) ~[?:?]
at org.pytorch.serve.archive.utils.ZipUtils.createSymbolicDir(ZipUtils.java:159) ~[model-server.jar:?]
at org.pytorch.serve.archive.model.ModelArchive.downloadModel(ModelArchive.java:98) ~[model-server.jar:?]
at org.pytorch.serve.wlm.ModelManager.createModelArchive(ModelManager.java:185) ~[model-server.jar:?]
at org.pytorch.serve.wlm.ModelManager.registerModel(ModelManager.java:143) ~[model-server.jar:?]
at org.pytorch.serve.wlm.ModelManager.registerModel(ModelManager.java:74) ~[model-server.jar:?]
at org.pytorch.serve.ModelServer.initModelStore(ModelServer.java:205) [model-server.jar:?]
at org.pytorch.serve.ModelServer.startRESTserver(ModelServer.java:399) [model-server.jar:?]
at org.pytorch.serve.ModelServer.startAndWait(ModelServer.java:124) [model-server.jar:?]
at org.pytorch.serve.ModelServer.main(ModelServer.java:105) [model-server.jar:?]
2024-10-10T11:31:05,483 [DEBUG] main org.pytorch.serve.ModelServer - Loading models from model store: stable-diffusion.mar
2024-10-10T11:31:38,874 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model stable-diffusion
2024-10-10T11:31:38,874 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model stable-diffusion
2024-10-10T11:32:06,070 [INFO ] main org.pytorch.serve.wlm.ModelManager - Installed custom pip packages for model stable-diffusion
2024-10-10T11:32:06,070 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model stable-diffusion loaded.
2024-10-10T11:32:06,070 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: stable-diffusion, count: 1
2024-10-10T11:32:06,070 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: NioServerSocketChannel.
2024-10-10T11:32:06,070 [DEBUG] W-9000-stable-diffusion_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [D:\Text-to-Image\only cartoon torchserve\env\Scripts\python.exe, D:\Text-to-Image\only cartoon torchserve\env\Lib\site-packages\ts\model_service_worker.py, --sock-type, tcp, --port, 9000, --metrics-config, D:\Text-to-Image\only cartoon torchserve\env\Lib\site-packages/ts/configs/metrics.yaml]
2024-10-10T11:32:06,118 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2024-10-10T11:32:06,118 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: NioServerSocketChannel.
2024-10-10T11:32:06,118 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2024-10-10T11:32:06,118 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: NioServerSocketChannel.
2024-10-10T11:32:06,118 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
2024-10-10T11:32:07,613 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Listening on addr:port: 127.0.0.1:9000
2024-10-10T11:32:07,613 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Successfully loaded D:\Text-to-Image\only cartoon torchserve\env\Lib\site-packages/ts/configs/metrics.yaml.
2024-10-10T11:32:07,613 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - [PID]6988
2024-10-10T11:32:07,613 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Torch worker started.
2024-10-10T11:32:07,613 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Python runtime: 3.10.6
2024-10-10T11:32:07,613 [DEBUG] W-9000-stable-diffusion_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-stable-diffusion_1.0 State change null -> WORKER_STARTED
2024-10-10T11:32:07,613 [INFO ] W-9000-stable-diffusion_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /127.0.0.1:9000
2024-10-10T11:32:07,613 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Connection accepted: ('127.0.0.1', 9000).
2024-10-10T11:32:07,628 [DEBUG] W-9000-stable-diffusion_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1728540127628
2024-10-10T11:32:07,628 [INFO ] W-9000-stable-diffusion_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1728540127628
2024-10-10T11:32:07,644 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - model_name: stable-diffusion, batchSize: 1
2024-10-10T11:32:08,883 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Enabled tensor cores
2024-10-10T11:32:08,883 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - OpenVINO is not enabled
2024-10-10T11:32:08,883 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2024-10-10T11:32:08,883 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2024-10-10T11:32:08,883 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Diffusers version 0.6.0
2024-10-10T11:32:08,883 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - initialized function called
2024-10-10T11:32:38,017 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Diffusion model Extracted successfully
2024-10-10T11:32:38,017 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Backend worker process died.
2024-10-10T11:32:38,017 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Traceback (most recent call last):
2024-10-10T11:32:38,017 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - File "D:\Text-to-Image\only cartoon torchserve\env\Lib\site-packages\ts\model_service_worker.py", line 301, in
2024-10-10T11:32:38,017 [INFO ] nioEventLoopGroup-5-1 org.pytorch.serve.wlm.WorkerThread - 9000 Worker disconnected. WORKER_STARTED
2024-10-10T11:32:38,017 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - worker.run_server()
2024-10-10T11:32:38,027 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - File "D:\Text-to-Image\only cartoon torchserve\env\Lib\site-packages\ts\model_service_worker.py", line 268, in run_server
2024-10-10T11:32:38,027 [DEBUG] W-9000-stable-diffusion_1.0 org.pytorch.serve.wlm.WorkerThread - System state is : WORKER_STARTED
2024-10-10T11:32:38,027 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - self.handle_connection(cl_socket)
2024-10-10T11:32:38,027 [DEBUG] W-9000-stable-diffusion_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker monitoring thread interrupted or backend worker process died., startupTimeout:360sec
java.lang.InterruptedException: null
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1770) ~[?:?]
at java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435) ~[?:?]
at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:234) ~[model-server.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]
at java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
at java.lang.Thread.run(Thread.java:1575) [?:?]
2024-10-10T11:32:38,027 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - File "D:\Text-to-Image\only cartoon torchserve\env\Lib\site-packages\ts\model_service_worker.py", line 196, in handle_connection
2024-10-10T11:32:38,028 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - service, result, code = self.load_model(msg)
2024-10-10T11:32:38,028 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - File "D:\Text-to-Image\only cartoon torchserve\env\Lib\site-packages\ts\model_service_worker.py", line 133, in load_model
2024-10-10T11:32:38,028 [WARN ] W-9000-stable-diffusion_1.0 org.pytorch.serve.wlm.BatchAggregator - Load model failed: stable-diffusion, error: Worker died.
2024-10-10T11:32:38,028 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - service = model_loader.load(
2024-10-10T11:32:38,029 [DEBUG] W-9000-stable-diffusion_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-stable-diffusion_1.0 State change WORKER_STARTED -> WORKER_STOPPED
2024-10-10T11:32:38,029 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - File "D:\Text-to-Image\only cartoon torchserve\env\lib\site-packages\ts\model_loader.py", line 143, in load
2024-10-10T11:32:38,029 [INFO ] W-9000-stable-diffusion_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery start timestamp: 1728540158029
2024-10-10T11:32:38,029 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - initialize_fn(service.context)
2024-10-10T11:32:38,029 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - File "C:\Users\Win\AppData\Local\Temp\models\7104be7eba2d4971bcbc3dcc27f2b599\stable_diffusion_handler.py", line 48, in initialize
2024-10-10T11:32:38,029 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - self.pipe = DiffusionPipeline.from_pretrained(model_dir + "/model")
2024-10-10T11:32:38,029 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - File "C:\Users\Win\AppData\Local\Temp\models\7104be7eba2d4971bcbc3dcc27f2b599\diffusers\pipeline_utils.py", line 403, in from_pretrained
2024-10-10T11:32:38,029 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - config_dict = cls.get_config_dict(cached_folder)
2024-10-10T11:32:38,029 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - File "C:\Users\Win\AppData\Local\Temp\models\7104be7eba2d4971bcbc3dcc27f2b599\diffusers\configuration_utils.py", line 217, in get_config_dict
2024-10-10T11:32:38,029 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - raise EnvironmentError(
2024-10-10T11:32:38,029 [INFO ] W-9000-stable-diffusion_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds.
2024-10-10T11:32:38,029 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - OSError: Error no file named model_index.json found in directory C:\Users\Win\AppData\Local\Temp\models\7104be7eba2d4971bcbc3dcc27f2b599/model.
2024-10-10T11:32:38,029 [INFO ] W-9000-stable-diffusion_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-stable-diffusion_1.0-stdout
2024-10-10T11:32:38,029 [INFO ] W-9000-stable-diffusion_1.0-stderr org.pytorch.serve.wlm.WorkerLifeCycle - Stopped Scanner - W-9000-stable-diffusion_1.0-stderr
2024-10-10T11:32:39,034 [DEBUG] W-9000-stable-diffusion_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [D:\Text-to-Image\only cartoon torchserve\env\Scripts\python.exe, D:\Text-to-Image\only cartoon torchserve\env\Lib\site-packages\ts\model_service_worker.py, --sock-type, tcp, --port, 9000, --metrics-config, D:\Text-to-Image\only cartoon torchserve\env\Lib\site-packages/ts/configs/metrics.yaml]
2024-10-10T11:32:40,367 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Listening on addr:port: 127.0.0.1:9000
2024-10-10T11:32:40,372 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Successfully loaded D:\Text-to-Image\only cartoon torchserve\env\Lib\site-packages/ts/configs/metrics.yaml.
2024-10-10T11:32:40,372 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - [PID]14112
2024-10-10T11:32:40,372 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Torch worker started.
2024-10-10T11:32:40,372 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Python runtime: 3.10.6
2024-10-10T11:32:40,372 [DEBUG] W-9000-stable-diffusion_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-stable-diffusion_1.0 State change WORKER_STOPPED -> WORKER_STARTED
2024-10-10T11:32:40,372 [INFO ] W-9000-stable-diffusion_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /127.0.0.1:9000
2024-10-10T11:32:40,372 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Connection accepted: ('127.0.0.1', 9000).
2024-10-10T11:32:40,372 [DEBUG] W-9000-stable-diffusion_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1728540160372
2024-10-10T11:32:40,372 [INFO ] W-9000-stable-diffusion_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1728540160372
2024-10-10T11:32:40,399 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - model_name: stable-diffusion, batchSize: 1
2024-10-10T11:32:41,293 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Enabled tensor cores
2024-10-10T11:32:41,293 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - OpenVINO is not enabled
2024-10-10T11:32:41,293 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2024-10-10T11:32:41,293 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2024-10-10T11:32:41,293 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - Diffusers version 0.6.0
2024-10-10T11:32:41,293 [INFO ] W-9000-stable-diffusion_1.0-stdout MODEL_LOG - initialized function called

Installation instructions

Yes I have installed torchserve from source

Model Packaging

I have followed this link to package the model

config.properties

inference_address=http://127.0.0.1:8080
management_address=http://127.0.0.1:8081
metrics_address=http://127.0.0.1:8082
enable_envvars_config=true
install_py_dep_per_model=true
load_models=all
max_response_size=655350000
disable_system_metrics=true
model_store=D:/Text-to-Image/only cartoon torchserve/diffusers
default_startup_timeout=360

Versions


Environment headers

Torchserve branch:

torchserve==0.12.0
torch-model-archiver==0.12.0

Python version: 3.10 (64-bit runtime)
Python executable: D:\Text-to-Image\only cartoon torchserve\env\Scripts\python.exe

Versions of relevant python libraries:
numpy==2.1.2
torch==2.4.1+cu118
torch-model-archiver==0.12.0
torchserve==0.12.0
torch==2.4.1+cu118
**Warning: torchtext not present ..
**Warning: torchvision not present ..
**Warning: torchaudio not present ..

Java Version:

OS: Microsoft Windows 11 Pro
GCC version: N/A
Clang version: N/A
CMake version: version 3.27.9

Is CUDA available: Yes
CUDA runtime version: 11.8.89
GPU models and configuration:
GPU 0: NVIDIA GeForce RTX 3050
Nvidia driver version: 537.58
cuDNN version: None

Repro instructions

I have followed instructions from link and I got error while running step 4

Possible Solution

No response

@dummyuser-123
Copy link
Author

After running command prompt as administrator, I getting this error now


(ts_diff_env) D:\Text-to-Image\torchserve_diffusers>torchserve --start --ts-config config.properties --disable-token-auth  --enable-model-api

(ts_diff_env) D:\Text-to-Image\torchserve_diffusers>WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2024-10-10T17:47:49,728 [DEBUG] main org.pytorch.serve.util.ConfigManager - xpu-smi not available or failed: Cannot run program "xpu-smi": CreateProcess error=2, The system cannot find the file specified
2024-10-10T17:47:49,744 [WARN ] main org.pytorch.serve.util.ConfigManager - Your torchserve instance can access any URL to load models. When deploying to production, make sure to limit the set of allowed_urls in config.properties
2024-10-10T17:47:49,759 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2024-10-10T17:47:49,791 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages/ts/configs/metrics.yaml
2024-10-10T17:47:49,901 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.12.0
TS Home: D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages
Current directory: D:\Text-to-Image\torchserve_diffusers
Temp directory: C:\Users\Win\AppData\Local\Temp
Metrics config path: D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages/ts/configs/metrics.yaml
Number of GPUs: 1
Number of CPUs: 12
Max heap size: 4056 M
Python executable: D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe
Config file: config.properties
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store: D:\Text-to-Image\torchserve_diffusers
Initial Models: all
Log dir: D:\Text-to-Image\torchserve_diffusers\logs
Metrics dir: D:\Text-to-Image\torchserve_diffusers\logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 655350000
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: true
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: true
Workflow Store: D:\Text-to-Image\torchserve_diffusers
CPP log config: N/A
Model config: N/A
System metrics command: default
Model API enabled: true
2024-10-10T17:47:49,901 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2024-10-10T17:47:49,916 [DEBUG] main org.pytorch.serve.ModelServer - Loading models from model store: Diffusion_model
2024-10-10T17:47:49,932 [INFO ] main org.pytorch.serve.archive.model.ModelArchive - createTempDir C:\Users\Win\AppData\Local\Temp\models\084876218b504c9fb5ebc93827d2ca42
2024-10-10T17:47:49,932 [INFO ] main org.pytorch.serve.archive.model.ModelArchive - createSymbolicDir C:\Users\Win\AppData\Local\Temp\models\084876218b504c9fb5ebc93827d2ca42\Diffusion_model
2024-10-10T17:47:49,932 [WARN ] main org.pytorch.serve.archive.model.ModelArchive - Model archive version is not defined. Please upgrade to torch-model-archiver 0.2.0 or higher
2024-10-10T17:47:49,932 [WARN ] main org.pytorch.serve.archive.model.ModelArchive - Model archive createdOn is not defined. Please upgrade to torch-model-archiver 0.2.0 or higher
2024-10-10T17:47:49,932 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model Diffusion_model
2024-10-10T17:47:49,932 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model Diffusion_model
2024-10-10T17:47:49,932 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model Diffusion_model loaded.
2024-10-10T17:47:49,932 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: Diffusion_model, count: 1
2024-10-10T17:47:49,932 [DEBUG] main org.pytorch.serve.ModelServer - Loading models from model store: logs
2024-10-10T17:47:49,932 [INFO ] main org.pytorch.serve.archive.model.ModelArchive - createTempDir C:\Users\Win\AppData\Local\Temp\models\217d664a8a064550b24c55d580ac5267
2024-10-10T17:47:49,932 [INFO ] main org.pytorch.serve.archive.model.ModelArchive - createSymbolicDir C:\Users\Win\AppData\Local\Temp\models\217d664a8a064550b24c55d580ac5267\logs
2024-10-10T17:47:49,932 [WARN ] main org.pytorch.serve.archive.model.ModelArchive - Model archive version is not defined. Please upgrade to torch-model-archiver 0.2.0 or higher
2024-10-10T17:47:49,932 [DEBUG] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe, D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages\ts\model_service_worker.py, --sock-type, tcp, --port, 9000, --metrics-config, D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages/ts/configs/metrics.yaml]
2024-10-10T17:47:49,948 [WARN ] main org.pytorch.serve.archive.model.ModelArchive - Model archive createdOn is not defined. Please upgrade to torch-model-archiver 0.2.0 or higher
2024-10-10T17:47:49,948 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model logs
2024-10-10T17:47:49,948 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model logs
2024-10-10T17:47:49,948 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model logs loaded.
2024-10-10T17:47:49,948 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: logs, count: 1
2024-10-10T17:47:49,948 [DEBUG] main org.pytorch.serve.ModelServer - Loading models from model store: stable-diffusion.mar
2024-10-10T17:47:49,948 [DEBUG] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe, D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages\ts\model_service_worker.py, --sock-type, tcp, --port, 9001, --metrics-config, D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages/ts/configs/metrics.yaml]
2024-10-10T17:47:49,948 [ERROR] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:210) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:106) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:375) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:192) ~[model-server.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.lang.Thread.run(Thread.java:1575) [?:?]
Caused by: java.io.IOException: Cannot run program "D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe" (in directory "C:\Users\Win\AppData\Local\Temp\models\217d664a8a064550b24c55d580ac5267\logs"): CreateProcess error=267, The directory name is invalid
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1170) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1089) ~[?:?]
        at java.lang.Runtime.exec(Runtime.java:681) ~[?:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:192) ~[model-server.jar:?]
        ... 8 more
Caused by: java.io.IOException: CreateProcess error=267, The directory name is invalid
        at java.lang.ProcessImpl.create(Native Method) ~[?:?]
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:500) ~[?:?]
        at java.lang.ProcessImpl.start(ProcessImpl.java:159) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1126) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1089) ~[?:?]
        at java.lang.Runtime.exec(Runtime.java:681) ~[?:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:192) ~[model-server.jar:?]
        ... 8 more
2024-10-10T17:47:49,945 [ERROR] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:210) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:106) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:375) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:192) ~[model-server.jar:?]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:317) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.lang.Thread.run(Thread.java:1575) [?:?]
Caused by: java.io.IOException: Cannot run program "D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe" (in directory "C:\Users\Win\AppData\Local\Temp\models\084876218b504c9fb5ebc93827d2ca42\Diffusion_model"): CreateProcess error=267, The directory name is invalid
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1170) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1089) ~[?:?]
        at java.lang.Runtime.exec(Runtime.java:681) ~[?:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:192) ~[model-server.jar:?]
        ... 8 more
Caused by: java.io.IOException: CreateProcess error=267, The directory name is invalid
        at java.lang.ProcessImpl.create(Native Method) ~[?:?]
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:500) ~[?:?]
        at java.lang.ProcessImpl.start(ProcessImpl.java:159) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1126) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1089) ~[?:?]
        at java.lang.Runtime.exec(Runtime.java:681) ~[?:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:192) ~[model-server.jar:?]
        ... 8 more
2024-10-10T17:47:49,956 [DEBUG] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-logs_1.0 State change null -> WORKER_STOPPED
2024-10-10T17:47:49,957 [DEBUG] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-Diffusion_model_1.0 State change null -> WORKER_STOPPED
2024-10-10T17:47:49,961 [INFO ] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery start timestamp: 1728562669961
2024-10-10T17:47:49,966 [INFO ] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery start timestamp: 1728562669966
2024-10-10T17:47:49,967 [INFO ] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9001 in 1 seconds.
2024-10-10T17:47:49,967 [INFO ] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds.
2024-10-10T17:47:50,968 [DEBUG] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe, D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages\ts\model_service_worker.py, --sock-type, tcp, --port, 9000, --metrics-config, D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages/ts/configs/metrics.yaml]
2024-10-10T17:47:50,968 [DEBUG] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe, D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages\ts\model_service_worker.py, --sock-type, tcp, --port, 9001, --metrics-config, D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages/ts/configs/metrics.yaml]
2024-10-10T17:47:50,971 [ERROR] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:210) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:106) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:375) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:192) ~[model-server.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.lang.Thread.run(Thread.java:1575) [?:?]
Caused by: java.io.IOException: Cannot run program "D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe" (in directory "C:\Users\Win\AppData\Local\Temp\models\084876218b504c9fb5ebc93827d2ca42\Diffusion_model"): CreateProcess error=267, The directory name is invalid
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1170) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1089) ~[?:?]
        at java.lang.Runtime.exec(Runtime.java:681) ~[?:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:192) ~[model-server.jar:?]
        ... 6 more
Caused by: java.io.IOException: CreateProcess error=267, The directory name is invalid
        at java.lang.ProcessImpl.create(Native Method) ~[?:?]
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:500) ~[?:?]
        at java.lang.ProcessImpl.start(ProcessImpl.java:159) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1126) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1089) ~[?:?]
        at java.lang.Runtime.exec(Runtime.java:681) ~[?:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:192) ~[model-server.jar:?]
        ... 6 more
2024-10-10T17:47:50,971 [ERROR] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:210) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:106) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:375) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:192) ~[model-server.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.lang.Thread.run(Thread.java:1575) [?:?]
Caused by: java.io.IOException: Cannot run program "D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe" (in directory "C:\Users\Win\AppData\Local\Temp\models\217d664a8a064550b24c55d580ac5267\logs"): CreateProcess error=267, The directory name is invalid
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1170) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1089) ~[?:?]
        at java.lang.Runtime.exec(Runtime.java:681) ~[?:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:192) ~[model-server.jar:?]
        ... 6 more
Caused by: java.io.IOException: CreateProcess error=267, The directory name is invalid
        at java.lang.ProcessImpl.create(Native Method) ~[?:?]
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:500) ~[?:?]
        at java.lang.ProcessImpl.start(ProcessImpl.java:159) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1126) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1089) ~[?:?]
        at java.lang.Runtime.exec(Runtime.java:681) ~[?:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:192) ~[model-server.jar:?]
        ... 6 more
2024-10-10T17:47:50,971 [DEBUG] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-Diffusion_model_1.0 State change WORKER_STOPPED -> WORKER_STOPPED
2024-10-10T17:47:50,972 [DEBUG] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-logs_1.0 State change WORKER_STOPPED -> WORKER_STOPPED
2024-10-10T17:47:50,973 [WARN ] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-10-10T17:47:50,973 [WARN ] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-10-10T17:47:50,974 [INFO ] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 1 seconds.
2024-10-10T17:47:50,974 [INFO ] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9001 in 1 seconds.
2024-10-10T17:47:51,978 [DEBUG] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe, D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages\ts\model_service_worker.py, --sock-type, tcp, --port, 9000, --metrics-config, D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages/ts/configs/metrics.yaml]
2024-10-10T17:47:51,978 [DEBUG] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe, D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages\ts\model_service_worker.py, --sock-type, tcp, --port, 9001, --metrics-config, D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages/ts/configs/metrics.yaml]
2024-10-10T17:47:51,982 [ERROR] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:210) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:106) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:375) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:192) ~[model-server.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.lang.Thread.run(Thread.java:1575) [?:?]
Caused by: java.io.IOException: Cannot run program "D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe" (in directory "C:\Users\Win\AppData\Local\Temp\models\217d664a8a064550b24c55d580ac5267\logs"): CreateProcess error=267, The directory name is invalid
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1170) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1089) ~[?:?]
        at java.lang.Runtime.exec(Runtime.java:681) ~[?:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:192) ~[model-server.jar:?]
        ... 6 more
Caused by: java.io.IOException: CreateProcess error=267, The directory name is invalid
        at java.lang.ProcessImpl.create(Native Method) ~[?:?]
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:500) ~[?:?]
        at java.lang.ProcessImpl.start(ProcessImpl.java:159) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1126) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1089) ~[?:?]
        at java.lang.Runtime.exec(Runtime.java:681) ~[?:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:192) ~[model-server.jar:?]
        ... 6 more
2024-10-10T17:47:51,982 [ERROR] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:210) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:106) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:375) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:192) ~[model-server.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.lang.Thread.run(Thread.java:1575) [?:?]
Caused by: java.io.IOException: Cannot run program "D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe" (in directory "C:\Users\Win\AppData\Local\Temp\models\084876218b504c9fb5ebc93827d2ca42\Diffusion_model"): CreateProcess error=267, The directory name is invalid
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1170) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1089) ~[?:?]
        at java.lang.Runtime.exec(Runtime.java:681) ~[?:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:192) ~[model-server.jar:?]
        ... 6 more
Caused by: java.io.IOException: CreateProcess error=267, The directory name is invalid
        at java.lang.ProcessImpl.create(Native Method) ~[?:?]
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:500) ~[?:?]
        at java.lang.ProcessImpl.start(ProcessImpl.java:159) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1126) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1089) ~[?:?]
        at java.lang.Runtime.exec(Runtime.java:681) ~[?:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:192) ~[model-server.jar:?]
        ... 6 more
2024-10-10T17:47:51,983 [DEBUG] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-logs_1.0 State change WORKER_STOPPED -> WORKER_STOPPED
2024-10-10T17:47:51,984 [DEBUG] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-Diffusion_model_1.0 State change WORKER_STOPPED -> WORKER_STOPPED
2024-10-10T17:47:51,984 [WARN ] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-10-10T17:47:51,985 [WARN ] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-10-10T17:47:51,985 [INFO ] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9001 in 2 seconds.
2024-10-10T17:47:51,991 [INFO ] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 2 seconds.
2024-10-10T17:47:53,997 [DEBUG] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe, D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages\ts\model_service_worker.py, --sock-type, tcp, --port, 9001, --metrics-config, D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages/ts/configs/metrics.yaml]
2024-10-10T17:47:53,997 [DEBUG] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe, D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages\ts\model_service_worker.py, --sock-type, tcp, --port, 9000, --metrics-config, D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages/ts/configs/metrics.yaml]
2024-10-10T17:47:53,997 [ERROR] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:210) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:106) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:375) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:192) ~[model-server.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.lang.Thread.run(Thread.java:1575) [?:?]
Caused by: java.io.IOException: Cannot run program "D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe" (in directory "C:\Users\Win\AppData\Local\Temp\models\217d664a8a064550b24c55d580ac5267\logs"): CreateProcess error=267, The directory name is invalid
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1170) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1089) ~[?:?]
        at java.lang.Runtime.exec(Runtime.java:681) ~[?:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:192) ~[model-server.jar:?]
        ... 6 more
Caused by: java.io.IOException: CreateProcess error=267, The directory name is invalid
        at java.lang.ProcessImpl.create(Native Method) ~[?:?]
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:500) ~[?:?]
        at java.lang.ProcessImpl.start(ProcessImpl.java:159) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1126) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1089) ~[?:?]
        at java.lang.Runtime.exec(Runtime.java:681) ~[?:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:192) ~[model-server.jar:?]
        ... 6 more
2024-10-10T17:47:53,997 [ERROR] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerThread - Backend worker error
org.pytorch.serve.wlm.WorkerInitializationException: Failed start worker process
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:210) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorker(WorkerLifeCycle.java:106) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerThread.connect(WorkerThread.java:375) ~[model-server.jar:?]
        at org.pytorch.serve.wlm.WorkerThread.run(WorkerThread.java:192) ~[model-server.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
        at java.lang.Thread.run(Thread.java:1575) [?:?]
Caused by: java.io.IOException: Cannot run program "D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe" (in directory "C:\Users\Win\AppData\Local\Temp\models\084876218b504c9fb5ebc93827d2ca42\Diffusion_model"): CreateProcess error=267, The directory name is invalid
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1170) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1089) ~[?:?]
        at java.lang.Runtime.exec(Runtime.java:681) ~[?:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:192) ~[model-server.jar:?]
        ... 6 more
Caused by: java.io.IOException: CreateProcess error=267, The directory name is invalid
        at java.lang.ProcessImpl.create(Native Method) ~[?:?]
        at java.lang.ProcessImpl.<init>(ProcessImpl.java:500) ~[?:?]
        at java.lang.ProcessImpl.start(ProcessImpl.java:159) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1126) ~[?:?]
        at java.lang.ProcessBuilder.start(ProcessBuilder.java:1089) ~[?:?]
        at java.lang.Runtime.exec(Runtime.java:681) ~[?:?]
        at org.pytorch.serve.wlm.WorkerLifeCycle.startWorkerPython(WorkerLifeCycle.java:192) ~[model-server.jar:?]
        ... 6 more
2024-10-10T17:47:54,000 [DEBUG] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerThread - W-9001-logs_1.0 State change WORKER_STOPPED -> WORKER_STOPPED
2024-10-10T17:47:54,001 [DEBUG] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-Diffusion_model_1.0 State change WORKER_STOPPED -> WORKER_STOPPED
2024-10-10T17:47:54,002 [WARN ] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-10-10T17:47:54,002 [WARN ] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerThread - Auto recovery failed again
2024-10-10T17:47:54,003 [INFO ] W-9001-logs_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9001 in 3 seconds.
2024-10-10T17:47:54,004 [INFO ] W-9000-Diffusion_model_1.0 org.pytorch.serve.wlm.WorkerThread - Retry worker: 9000 in 3 seconds.


(ts_diff_env) D:\Text-to-Image\torchserve_diffusers>

Don't why it is happening with diffusers example because I have cross checked with alexnet example given in the repo and it is working perfectly.

@mreso
Copy link
Collaborator

mreso commented Oct 10, 2024

Hi @dummyuser-123 thats strange, can you post the output of the successful run with alexnet?

@dummyuser-123
Copy link
Author

Sure, This logs are from torchserve side:

(ts_env) D:\Text-to-Image\torchserve>torchserve --start --model-store model_store --models alexnet=alexnet.mar --disable-token-auth  --enable-model-api

(ts_env) D:\Text-to-Image\torchserve>WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2024-10-11T09:49:44,142 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2024-10-11T09:49:44,213 [DEBUG] main org.pytorch.serve.util.ConfigManager - xpu-smi not available or failed: Cannot run program "xpu-smi": CreateProcess error=2, The system cannot find the file specified
2024-10-11T09:49:44,214 [WARN ] main org.pytorch.serve.util.ConfigManager - Your torchserve instance can access any URL to load models. When deploying to production, make sure to limit the set of allowed_urls in config.properties
2024-10-11T09:49:44,230 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2024-10-11T09:49:44,268 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from D:\Text-to-Image\torchserve\ts_env\Lib\site-packages/ts/configs/metrics.yaml
2024-10-11T09:49:44,366 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.12.0
TS Home: D:\Text-to-Image\torchserve\ts_env\Lib\site-packages
Current directory: D:\Text-to-Image\torchserve
Temp directory: C:\Users\Win\AppData\Local\Temp
Metrics config path: D:\Text-to-Image\torchserve\ts_env\Lib\site-packages/ts/configs/metrics.yaml
Number of GPUs: 1
Number of CPUs: 12
Max heap size: 4056 M
Python executable: D:\Text-to-Image\torchserve\ts_env\Scripts\python.exe
Config file: N/A
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store: D:\Text-to-Image\torchserve\model_store
Initial Models: alexnet=alexnet.mar
Log dir: D:\Text-to-Image\torchserve\logs
Metrics dir: D:\Text-to-Image\torchserve\logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 6553500
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: false
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: false
Workflow Store: D:\Text-to-Image\torchserve\model_store
CPP log config: N/A
Model config: N/A
System metrics command: default
Model API enabled: true
2024-10-11T09:49:44,371 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: alexnet.mar
2024-10-11T09:49:47,050 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model alexnet
2024-10-11T09:49:47,050 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model alexnet
2024-10-11T09:49:47,050 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model alexnet loaded.
2024-10-11T09:49:47,051 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: alexnet, count: 1
2024-10-11T09:49:47,058 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: NioServerSocketChannel.
2024-10-11T09:49:47,059 [DEBUG] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [D:\Text-to-Image\torchserve\ts_env\Scripts\python.exe, D:\Text-to-Image\torchserve\ts_env\Lib\site-packages\ts\model_service_worker.py, --sock-type, tcp, --port, 9000, --metrics-config, D:\Text-to-Image\torchserve\ts_env\Lib\site-packages/ts/configs/metrics.yaml]
2024-10-11T09:49:47,102 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2024-10-11T09:49:47,103 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: NioServerSocketChannel.
2024-10-11T09:49:47,104 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2024-10-11T09:49:47,104 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: NioServerSocketChannel.
2024-10-11T09:49:47,105 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
2024-10-11T09:49:47,258 [WARN ] pool-3-thread-1 org.pytorch.serve.metrics.MetricCollector - worker pid is not available yet.
2024-10-11T09:49:47,754 [INFO ] pool-3-thread-1 TS_METRICS - CPUUtilization.Percent:0.0|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728620387
2024-10-11T09:49:47,756 [INFO ] pool-3-thread-1 TS_METRICS - DiskAvailable.Gigabytes:72.93034362792969|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728620387
2024-10-11T09:49:47,757 [INFO ] pool-3-thread-1 TS_METRICS - DiskUsage.Gigabytes:395.81965255737305|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728620387
2024-10-11T09:49:47,758 [INFO ] pool-3-thread-1 TS_METRICS - DiskUtilization.Percent:84.4|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728620387
2024-10-11T09:49:47,758 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUtilization.Percent:6.04248046875|#Level:Host,DeviceId:0|#hostname:DESKTOP-7K8171O,timestamp:1728620387
2024-10-11T09:49:47,759 [INFO ] pool-3-thread-1 TS_METRICS - GPUMemoryUsed.Megabytes:495.0|#Level:Host,DeviceId:0|#hostname:DESKTOP-7K8171O,timestamp:1728620387
2024-10-11T09:49:47,760 [INFO ] pool-3-thread-1 TS_METRICS - GPUUtilization.Percent:0.0|#Level:Host,DeviceId:0|#hostname:DESKTOP-7K8171O,timestamp:1728620387
2024-10-11T09:49:47,760 [INFO ] pool-3-thread-1 TS_METRICS - MemoryAvailable.Megabytes:9921.55078125|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728620387
2024-10-11T09:49:47,761 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUsed.Megabytes:6301.578125|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728620387
2024-10-11T09:49:47,761 [INFO ] pool-3-thread-1 TS_METRICS - MemoryUtilization.Percent:38.8|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728620387
2024-10-11T09:49:48,530 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - Listening on addr:port: 127.0.0.1:9000
2024-10-11T09:49:48,536 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - Successfully loaded D:\Text-to-Image\torchserve\ts_env\Lib\site-packages/ts/configs/metrics.yaml.
2024-10-11T09:49:48,536 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - [PID]15736
2024-10-11T09:49:48,536 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - Torch worker started.
2024-10-11T09:49:48,537 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - Python runtime: 3.10.6
2024-10-11T09:49:48,537 [DEBUG] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-alexnet_1.0 State change null -> WORKER_STARTED
2024-10-11T09:49:48,539 [INFO ] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /127.0.0.1:9000
2024-10-11T09:49:48,545 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - Connection accepted: ('127.0.0.1', 9000).
2024-10-11T09:49:48,547 [DEBUG] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1728620388547
2024-10-11T09:49:48,548 [INFO ] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1728620388548
2024-10-11T09:49:48,568 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - model_name: alexnet, batchSize: 1
2024-10-11T09:49:49,473 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - Enabled tensor cores
2024-10-11T09:49:49,474 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - OpenVINO is not enabled
2024-10-11T09:49:49,474 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2024-10-11T09:49:49,475 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2024-10-11T09:49:49,683 [WARN ] W-9000-alexnet_1.0-stderr MODEL_LOG - D:\Text-to-Image\torchserve\ts_env\lib\site-packages\ts\torch_handler\base_handler.py:355: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
2024-10-11T09:49:49,684 [WARN ] W-9000-alexnet_1.0-stderr MODEL_LOG -   state_dict = torch.load(model_pt_path, map_location=map_location)
2024-10-11T09:49:50,016 [INFO ] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 1468
2024-10-11T09:49:50,017 [DEBUG] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-alexnet_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-10-11T09:49:50,018 [INFO ] W-9000-alexnet_1.0 TS_METRICS - WorkerLoadTime.Milliseconds:2961.0|#WorkerName:W-9000-alexnet_1.0,Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728620390
2024-10-11T09:49:50,019 [INFO ] W-9000-alexnet_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:4.0|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728620390
2024-10-11T09:50:04,545 [INFO ] nioEventLoopGroup-3-1 TS_METRICS - ts_inference_requests_total.Count:1.0|#model_name:alexnet,model_version:default|#hostname:DESKTOP-7K8171O,timestamp:1728620404
2024-10-11T09:50:04,547 [DEBUG] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd PREDICT repeats 1 to backend at: 1728620404547
2024-10-11T09:50:04,547 [INFO ] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1728620404547
2024-10-11T09:50:04,548 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - Backend received inference at: 1728620404
2024-10-11T09:50:07,895 [INFO ] W-9000-alexnet_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]HandlerTime.Milliseconds:3346.83|#ModelName:alexnet,Level:Model|#type:GAUGE|#hostname:DESKTOP-7K8171O,1728620407,94ba59d7-cf76-4cf6-9f5b-6588170230d2, pattern=[METRICS]
2024-10-11T09:50:07,896 [INFO ] W-9000-alexnet_1.0 org.pytorch.serve.wlm.BatchAggregator - Sending response for jobId 94ba59d7-cf76-4cf6-9f5b-6588170230d2
2024-10-11T09:50:07,896 [INFO ] W-9000-alexnet_1.0-stdout MODEL_METRICS - HandlerTime.ms:3346.83|#ModelName:alexnet,Level:Model|#hostname:DESKTOP-7K8171O,requestID:94ba59d7-cf76-4cf6-9f5b-6588170230d2,timestamp:1728620407
2024-10-11T09:50:07,897 [INFO ] W-9000-alexnet_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]PredictionTime.Milliseconds:3346.83|#ModelName:alexnet,Level:Model|#type:GAUGE|#hostname:DESKTOP-7K8171O,1728620407,94ba59d7-cf76-4cf6-9f5b-6588170230d2, pattern=[METRICS]
2024-10-11T09:50:07,897 [INFO ] W-9000-alexnet_1.0-stdout MODEL_METRICS - PredictionTime.ms:3346.83|#ModelName:alexnet,Level:Model|#hostname:DESKTOP-7K8171O,requestID:94ba59d7-cf76-4cf6-9f5b-6588170230d2,timestamp:1728620407
2024-10-11T09:50:07,897 [INFO ] W-9000-alexnet_1.0 ACCESS_LOG - /127.0.0.1:59859 "PUT /predictions/alexnet HTTP/1.1" 200 3353
2024-10-11T09:50:07,898 [INFO ] W-9000-alexnet_1.0 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728620407
2024-10-11T09:50:07,898 [INFO ] W-9000-alexnet_1.0 TS_METRICS - ts_inference_latency_microseconds.Microseconds:3349614.8|#model_name:alexnet,model_version:default|#hostname:DESKTOP-7K8171O,timestamp:1728620407
2024-10-11T09:50:07,899 [INFO ] W-9000-alexnet_1.0 TS_METRICS - ts_queue_latency_microseconds.Microseconds:107.0|#model_name:alexnet,model_version:default|#hostname:DESKTOP-7K8171O,timestamp:1728620407
2024-10-11T09:50:07,899 [DEBUG] W-9000-alexnet_1.0 org.pytorch.serve.job.RestJob - Waiting time ns: 107000, Backend time ns: 3351905900
2024-10-11T09:50:07,899 [INFO ] W-9000-alexnet_1.0 TS_METRICS - QueueTime.Milliseconds:0.0|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728620407
2024-10-11T09:50:07,900 [INFO ] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 3348
2024-10-11T09:50:07,900 [INFO ] W-9000-alexnet_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:5.0|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728620407
2024-10-11T09:50:13,893 [INFO ] nioEventLoopGroup-3-2 TS_METRICS - ts_inference_requests_total.Count:1.0|#model_name:alexnet,model_version:default|#hostname:DESKTOP-7K8171O,timestamp:1728620413
2024-10-11T09:50:13,894 [DEBUG] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd PREDICT repeats 1 to backend at: 1728620413894
2024-10-11T09:50:13,894 [INFO ] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1728620413894
2024-10-11T09:50:13,895 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - Backend received inference at: 1728620413
2024-10-11T09:50:13,964 [INFO ] W-9000-alexnet_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]HandlerTime.Milliseconds:69.51|#ModelName:alexnet,Level:Model|#type:GAUGE|#hostname:DESKTOP-7K8171O,1728620413,7fb7e588-a7eb-409e-9dc8-3b9fd20abde0, pattern=[METRICS]
2024-10-11T09:50:13,964 [INFO ] W-9000-alexnet_1.0 org.pytorch.serve.wlm.BatchAggregator - Sending response for jobId 7fb7e588-a7eb-409e-9dc8-3b9fd20abde0
2024-10-11T09:50:13,964 [INFO ] W-9000-alexnet_1.0-stdout MODEL_METRICS - HandlerTime.ms:69.51|#ModelName:alexnet,Level:Model|#hostname:DESKTOP-7K8171O,requestID:7fb7e588-a7eb-409e-9dc8-3b9fd20abde0,timestamp:1728620413
2024-10-11T09:50:13,965 [INFO ] W-9000-alexnet_1.0 ACCESS_LOG - /127.0.0.1:59860 "PUT /predictions/alexnet HTTP/1.1" 200 72
2024-10-11T09:50:13,965 [INFO ] W-9000-alexnet_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]PredictionTime.Milliseconds:70.51|#ModelName:alexnet,Level:Model|#type:GAUGE|#hostname:DESKTOP-7K8171O,1728620413,7fb7e588-a7eb-409e-9dc8-3b9fd20abde0, pattern=[METRICS]
2024-10-11T09:50:13,965 [INFO ] W-9000-alexnet_1.0 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728620413
2024-10-11T09:50:13,966 [INFO ] W-9000-alexnet_1.0-stdout MODEL_METRICS - PredictionTime.ms:70.51|#ModelName:alexnet,Level:Model|#hostname:DESKTOP-7K8171O,requestID:7fb7e588-a7eb-409e-9dc8-3b9fd20abde0,timestamp:1728620413
2024-10-11T09:50:13,966 [INFO ] W-9000-alexnet_1.0 TS_METRICS - ts_inference_latency_microseconds.Microseconds:71523.0|#model_name:alexnet,model_version:default|#hostname:DESKTOP-7K8171O,timestamp:1728620413
2024-10-11T09:50:13,967 [INFO ] W-9000-alexnet_1.0 TS_METRICS - ts_queue_latency_microseconds.Microseconds:62.0|#model_name:alexnet,model_version:default|#hostname:DESKTOP-7K8171O,timestamp:1728620413
2024-10-11T09:50:13,967 [DEBUG] W-9000-alexnet_1.0 org.pytorch.serve.job.RestJob - Waiting time ns: 62000, Backend time ns: 73738100
2024-10-11T09:50:13,967 [INFO ] W-9000-alexnet_1.0 TS_METRICS - QueueTime.Milliseconds:0.0|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728620413
2024-10-11T09:50:13,967 [INFO ] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 70
2024-10-11T09:50:13,968 [INFO ] W-9000-alexnet_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:4.0|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728620413

And this are from inference side:

(ts_env) D:\Text-to-Image\torchserve>curl http://127.0.0.1:8080/predictions/alexnet -T kitten.jpg
{
  "tabby": 0.3188355267047882,
  "tiger_cat": 0.2579926550388336,
  "Egyptian_cat": 0.24233946204185486,
  "lynx": 0.1685769110918045,
  "tiger": 0.00650126114487648
}
(ts_env) D:\Text-to-Image\torchserve>curl http://127.0.0.1:8080/predictions/alexnet -T huskey.jpg
{
  "Eskimo_dog": 0.6400406956672668,
  "Siberian_husky": 0.13422252237796783,
  "dogsled": 0.12762515246868134,
  "malamute": 0.06892743706703186,
  "Norwegian_elkhound": 0.018150705844163895
}

@dummyuser-123
Copy link
Author

Sorry by mistake I have pressed the close the issue button.

@dummyuser-123 dummyuser-123 reopened this Oct 11, 2024
@mreso
Copy link
Collaborator

mreso commented Oct 11, 2024

Seems like you're running the examples in two different environments with different python executables

D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe

vs

D:\Text-to-Image\torchserve\ts_env\Scripts\python.exe

Can you try to run the alexnet example in the same environment as the diffuser example?

@dummyuser-123
Copy link
Author

Okay, This is from the same environment:

(ts_diff_env) D:\Text-to-Image\torchserve_diffusers>torchserve --start --model-store model_store --models alexnet=alexnet.mar --disable-token-auth  --enable-model-api

(ts_diff_env) D:\Text-to-Image\torchserve_diffusers>WARNING: sun.reflect.Reflection.getCallerClass is not supported. This will impact performance.
2024-10-11T10:06:26,952 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager -  Loading snapshot serializer plugin...
2024-10-11T10:06:27,013 [DEBUG] main org.pytorch.serve.util.ConfigManager - xpu-smi not available or failed: Cannot run program "xpu-smi": CreateProcess error=2, The system cannot find the file specified
2024-10-11T10:06:27,015 [WARN ] main org.pytorch.serve.util.ConfigManager - Your torchserve instance can access any URL to load models. When deploying to production, make sure to limit the set of allowed_urls in config.properties
2024-10-11T10:06:27,033 [INFO ] main org.pytorch.serve.servingsdk.impl.PluginsManager - Initializing plugins manager...
2024-10-11T10:06:27,076 [INFO ] main org.pytorch.serve.metrics.configuration.MetricConfiguration - Successfully loaded metrics configuration from D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages/ts/configs/metrics.yaml2024-10-11T10:06:27,178 [INFO ] main org.pytorch.serve.ModelServer -
Torchserve version: 0.12.0
TS Home: D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages
Current directory: D:\Text-to-Image\torchserve_diffusers
Temp directory: C:\Users\Win\AppData\Local\Temp
Metrics config path: D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages/ts/configs/metrics.yaml
Number of GPUs: 1
Number of CPUs: 12
Max heap size: 4056 M
Python executable: D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe
Config file: config.properties
Inference address: http://127.0.0.1:8080
Management address: http://127.0.0.1:8081
Metrics address: http://127.0.0.1:8082
Model Store: D:\Text-to-Image\torchserve_diffusers\model_store
Initial Models: alexnet=alexnet.mar
Log dir: D:\Text-to-Image\torchserve_diffusers\logs
Metrics dir: D:\Text-to-Image\torchserve_diffusers\logs
Netty threads: 0
Netty client threads: 0
Default workers per model: 1
Blacklist Regex: N/A
Maximum Response Size: 655350000
Maximum Request Size: 6553500
Limit Maximum Image Pixels: true
Prefer direct buffer: false
Allowed Urls: [file://.*|http(s)?://.*]
Custom python dependency for model allowed: true
Enable metrics API: true
Metrics mode: LOG
Disable system metrics: true
Workflow Store: D:\Text-to-Image\torchserve_diffusers\model_store
CPP log config: N/A
Model config: N/A
System metrics command: default
Model API enabled: true
2024-10-11T10:06:27,184 [INFO ] main org.pytorch.serve.ModelServer - Loading initial models: alexnet.mar
2024-10-11T10:06:29,872 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Adding new version 1.0 for model alexnet
2024-10-11T10:06:29,872 [DEBUG] main org.pytorch.serve.wlm.ModelVersionedRefs - Setting default version to 1.0 for model alexnet
2024-10-11T10:06:29,872 [INFO ] main org.pytorch.serve.wlm.ModelManager - Model alexnet loaded.
2024-10-11T10:06:29,873 [DEBUG] main org.pytorch.serve.wlm.ModelManager - updateModel: alexnet, count: 1
2024-10-11T10:06:29,879 [INFO ] main org.pytorch.serve.ModelServer - Initialize Inference server with: NioServerSocketChannel.
2024-10-11T10:06:29,880 [DEBUG] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerLifeCycle - Worker cmdline: [D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Scripts\python.exe, D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages\ts\model_service_worker.py, --sock-type, tcp, --port, 9000, --metrics-config, D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages/ts/configs/metrics.yaml]
2024-10-11T10:06:29,925 [INFO ] main org.pytorch.serve.ModelServer - Inference API bind to: http://127.0.0.1:8080
2024-10-11T10:06:29,925 [INFO ] main org.pytorch.serve.ModelServer - Initialize Management server with: NioServerSocketChannel.
2024-10-11T10:06:29,926 [INFO ] main org.pytorch.serve.ModelServer - Management API bind to: http://127.0.0.1:8081
2024-10-11T10:06:29,926 [INFO ] main org.pytorch.serve.ModelServer - Initialize Metrics server with: NioServerSocketChannel.
2024-10-11T10:06:29,927 [INFO ] main org.pytorch.serve.ModelServer - Metrics API bind to: http://127.0.0.1:8082
Model server started.
2024-10-11T10:06:31,294 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - Listening on addr:port: 127.0.0.1:9000
2024-10-11T10:06:31,299 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - Successfully loaded D:\Text-to-Image\torchserve_diffusers\ts_diff_env\Lib\site-packages/ts/configs/metrics.yaml.
2024-10-11T10:06:31,300 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - [PID]10636
2024-10-11T10:06:31,300 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - Torch worker started.
2024-10-11T10:06:31,300 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - Python runtime: 3.10.6
2024-10-11T10:06:31,301 [DEBUG] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-alexnet_1.0 State change null -> WORKER_STARTED
2024-10-11T10:06:31,303 [INFO ] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Connecting to: /127.0.0.1:9000
2024-10-11T10:06:31,309 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - Connection accepted: ('127.0.0.1', 9000).
2024-10-11T10:06:31,312 [DEBUG] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd LOAD repeats 1 to backend at: 1728621391312
2024-10-11T10:06:31,313 [INFO ] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1728621391313
2024-10-11T10:06:31,337 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - model_name: alexnet, batchSize: 1
2024-10-11T10:06:32,819 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - Enabled tensor cores
2024-10-11T10:06:32,819 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - OpenVINO is not enabled
2024-10-11T10:06:32,820 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - proceeding without onnxruntime
2024-10-11T10:06:32,820 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - Torch TensorRT not enabled
2024-10-11T10:06:33,043 [WARN ] W-9000-alexnet_1.0-stderr MODEL_LOG - D:\Text-to-Image\torchserve_diffusers\ts_diff_env\lib\site-packages\ts\torch_handler\base_handler.py:355: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
2024-10-11T10:06:33,043 [WARN ] W-9000-alexnet_1.0-stderr MODEL_LOG -   state_dict = torch.load(model_pt_path, map_location=map_location)
2024-10-11T10:06:33,378 [INFO ] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 2063
2024-10-11T10:06:33,379 [DEBUG] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - W-9000-alexnet_1.0 State change WORKER_STARTED -> WORKER_MODEL_LOADED
2024-10-11T10:06:33,379 [INFO ] W-9000-alexnet_1.0 TS_METRICS - WorkerLoadTime.Milliseconds:3503.0|#WorkerName:W-9000-alexnet_1.0,Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728621393
2024-10-11T10:06:33,380 [INFO ] W-9000-alexnet_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:5.0|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728621393
2024-10-11T10:06:43,807 [INFO ] nioEventLoopGroup-3-1 TS_METRICS - ts_inference_requests_total.Count:1.0|#model_name:alexnet,model_version:default|#hostname:DESKTOP-7K8171O,timestamp:1728621403
2024-10-11T10:06:43,809 [DEBUG] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd PREDICT repeats 1 to backend at: 1728621403809
2024-10-11T10:06:43,809 [INFO ] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1728621403809
2024-10-11T10:06:43,810 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - Backend received inference at: 1728621403
2024-10-11T10:06:50,072 [INFO ] W-9000-alexnet_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]HandlerTime.Milliseconds:6262.75|#ModelName:alexnet,Level:Model|#type:GAUGE|#hostname:DESKTOP-7K8171O,1728621410,3189e4ff-e587-4750-b26f-8fd6531911ac, pattern=[METRICS]
2024-10-11T10:06:50,073 [INFO ] W-9000-alexnet_1.0 org.pytorch.serve.wlm.BatchAggregator - Sending response for jobId 3189e4ff-e587-4750-b26f-8fd6531911ac
2024-10-11T10:06:50,074 [INFO ] W-9000-alexnet_1.0-stdout MODEL_METRICS - HandlerTime.ms:6262.75|#ModelName:alexnet,Level:Model|#hostname:DESKTOP-7K8171O,requestID:3189e4ff-e587-4750-b26f-8fd6531911ac,timestamp:1728621410
2024-10-11T10:06:50,074 [INFO ] W-9000-alexnet_1.0 ACCESS_LOG - /127.0.0.1:60071 "PUT /predictions/alexnet HTTP/1.1" 200 6268
2024-10-11T10:06:50,074 [INFO ] W-9000-alexnet_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]PredictionTime.Milliseconds:6262.75|#ModelName:alexnet,Level:Model|#type:GAUGE|#hostname:DESKTOP-7K8171O,1728621410,3189e4ff-e587-4750-b26f-8fd6531911ac, pattern=[METRICS]
2024-10-11T10:06:50,075 [INFO ] W-9000-alexnet_1.0 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728621410
2024-10-11T10:06:50,075 [INFO ] W-9000-alexnet_1.0-stdout MODEL_METRICS - PredictionTime.ms:6262.75|#ModelName:alexnet,Level:Model|#hostname:DESKTOP-7K8171O,requestID:3189e4ff-e587-4750-b26f-8fd6531911ac,timestamp:1728621410
2024-10-11T10:06:50,075 [INFO ] W-9000-alexnet_1.0 TS_METRICS - ts_inference_latency_microseconds.Microseconds:6265728.4|#model_name:alexnet,model_version:default|#hostname:DESKTOP-7K8171O,timestamp:1728621410
2024-10-11T10:06:50,076 [INFO ] W-9000-alexnet_1.0 TS_METRICS - ts_queue_latency_microseconds.Microseconds:102.5|#model_name:alexnet,model_version:default|#hostname:DESKTOP-7K8171O,timestamp:1728621410
2024-10-11T10:06:50,076 [DEBUG] W-9000-alexnet_1.0 org.pytorch.serve.job.RestJob - Waiting time ns: 102500, Backend time ns: 6268402200
2024-10-11T10:06:50,077 [INFO ] W-9000-alexnet_1.0 TS_METRICS - QueueTime.Milliseconds:0.0|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728621410
2024-10-11T10:06:50,077 [INFO ] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 6264
2024-10-11T10:06:50,077 [INFO ] W-9000-alexnet_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:4.0|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728621410
2024-10-11T10:06:53,259 [INFO ] nioEventLoopGroup-3-2 TS_METRICS - ts_inference_requests_total.Count:1.0|#model_name:alexnet,model_version:default|#hostname:DESKTOP-7K8171O,timestamp:1728621413
2024-10-11T10:06:53,259 [DEBUG] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Flushing req.cmd PREDICT repeats 1 to backend at: 1728621413259
2024-10-11T10:06:53,259 [INFO ] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Looping backend response at: 1728621413259
2024-10-11T10:06:53,260 [INFO ] W-9000-alexnet_1.0-stdout MODEL_LOG - Backend received inference at: 1728621413
2024-10-11T10:06:53,327 [INFO ] W-9000-alexnet_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]HandlerTime.Milliseconds:66.51|#ModelName:alexnet,Level:Model|#type:GAUGE|#hostname:DESKTOP-7K8171O,1728621413,663d9476-ce81-42c2-b8e7-dc60b2d93624, pattern=[METRICS]
2024-10-11T10:06:53,328 [INFO ] W-9000-alexnet_1.0 org.pytorch.serve.wlm.BatchAggregator - Sending response for jobId 663d9476-ce81-42c2-b8e7-dc60b2d93624
2024-10-11T10:06:53,328 [INFO ] W-9000-alexnet_1.0-stdout MODEL_METRICS - HandlerTime.ms:66.51|#ModelName:alexnet,Level:Model|#hostname:DESKTOP-7K8171O,requestID:663d9476-ce81-42c2-b8e7-dc60b2d93624,timestamp:1728621413
2024-10-11T10:06:53,328 [INFO ] W-9000-alexnet_1.0 ACCESS_LOG - /127.0.0.1:60072 "PUT /predictions/alexnet HTTP/1.1" 200 70
2024-10-11T10:06:53,328 [INFO ] W-9000-alexnet_1.0-stdout org.pytorch.serve.wlm.WorkerLifeCycle - result=[METRICS]PredictionTime.Milliseconds:66.51|#ModelName:alexnet,Level:Model|#type:GAUGE|#hostname:DESKTOP-7K8171O,1728621413,663d9476-ce81-42c2-b8e7-dc60b2d93624, pattern=[METRICS]
2024-10-11T10:06:53,329 [INFO ] W-9000-alexnet_1.0 TS_METRICS - Requests2XX.Count:1.0|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728621413
2024-10-11T10:06:53,329 [INFO ] W-9000-alexnet_1.0-stdout MODEL_METRICS - PredictionTime.ms:66.51|#ModelName:alexnet,Level:Model|#hostname:DESKTOP-7K8171O,requestID:663d9476-ce81-42c2-b8e7-dc60b2d93624,timestamp:1728621413
2024-10-11T10:06:53,329 [INFO ] W-9000-alexnet_1.0 TS_METRICS - ts_inference_latency_microseconds.Microseconds:68171.0|#model_name:alexnet,model_version:default|#hostname:DESKTOP-7K8171O,timestamp:1728621413
2024-10-11T10:06:53,330 [INFO ] W-9000-alexnet_1.0 TS_METRICS - ts_queue_latency_microseconds.Microseconds:56.6|#model_name:alexnet,model_version:default|#hostname:DESKTOP-7K8171O,timestamp:1728621413
2024-10-11T10:06:53,330 [DEBUG] W-9000-alexnet_1.0 org.pytorch.serve.job.RestJob - Waiting time ns: 56600, Backend time ns: 70408500
2024-10-11T10:06:53,331 [INFO ] W-9000-alexnet_1.0 TS_METRICS - QueueTime.Milliseconds:0.0|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728621413
2024-10-11T10:06:53,331 [INFO ] W-9000-alexnet_1.0 org.pytorch.serve.wlm.WorkerThread - Backend response time: 67
2024-10-11T10:06:53,331 [INFO ] W-9000-alexnet_1.0 TS_METRICS - WorkerThreadTime.Milliseconds:5.0|#Level:Host|#hostname:DESKTOP-7K8171O,timestamp:1728621413
(ts_env) D:\Text-to-Image\torchserve>curl http://127.0.0.1:8080/predictions/alexnet -T kitten.jpg
{
  "tabby": 0.3188355267047882,
  "tiger_cat": 0.2579926550388336,
  "Egyptian_cat": 0.24233946204185486,
  "lynx": 0.1685769110918045,
  "tiger": 0.00650126114487648
}
(ts_env) D:\Text-to-Image\torchserve>curl http://127.0.0.1:8080/predictions/alexnet -T huskey.jpg
{
  "Eskimo_dog": 0.6400406956672668,
  "Siberian_husky": 0.13422252237796783,
  "dogsled": 0.12762515246868134,
  "malamute": 0.06892743706703186,
  "Norwegian_elkhound": 0.018150705844163895
}

@dummyuser-123
Copy link
Author

Also by mistake I have made inference call from different env, so here it is the updated log of it.

(ts_diff_env) D:\Text-to-Image\torchserve_diffusers>curl http://127.0.0.1:8080/predictions/alexnet -T kitten.jpg
{
  "tabby": 0.3188355267047882,
  "tiger_cat": 0.2579926550388336,
  "Egyptian_cat": 0.24233946204185486,
  "lynx": 0.1685769110918045,
  "tiger": 0.00650126114487648
}
(ts_diff_env) D:\Text-to-Image\torchserve_diffusers>curl http://127.0.0.1:8080/predictions/alexnet -T huskey.jpg
{
  "Eskimo_dog": 0.6400406956672668,
  "Siberian_husky": 0.13422252237796783,
  "dogsled": 0.12762515246868134,
  "malamute": 0.06892743706703186,
  "Norwegian_elkhound": 0.018150705844163895
}

@dummyuser-123
Copy link
Author

Also, while debugging the issue yesterday I saw this difference with diffusers and alexnet part.

When starting the alexnet model, it create such files in temp folder

Screenshot 2024-10-10 154415

But when starting the diffusers model, it is not creating any such files in temp folder

Screenshot 2024-10-11 101336

In case, if this might help you to find the exact problem.

@mreso
Copy link
Collaborator

mreso commented Oct 11, 2024

Can you check your MAR file and post its content? Its basically a zip file that you can just decompress.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants