Model parallelization for inference on large images #720

tonyreina · 2024-06-11T21:08:30Z

Search before asking

I have searched the HUB issues and discussions and found no similar questions.

Question

I've got images that are 8k and larger in size. I'd like to use the entire image in the Yolo model but of course that leads to memory and latency issues in the inference.

Is anyone aware of doing inference for Yolo where they employ model parallelism across say 4 GPUs (NVLinked on same node) to handle 4 sections of the image (or 4 sections of the model) simultaneously and then stich them back together at the end?

Is there anyway to do something like this with Ultralytics?

Thanks.
Best.
-Tony

Additional

No response

github-actions · 2024-06-11T21:08:57Z

👋 Hello @tonyreina, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:

Quickstart. Start training and deploying YOLO models with HUB in seconds.
Datasets: Preparing and Uploading. Learn how to prepare and upload your datasets to HUB in YOLO format.
Projects: Creating and Managing. Group your models into projects for improved organization.
Models: Training and Exporting. Train YOLOv5 and YOLOv8 models on your custom datasets and export them to various formats for deployment.
Integrations. Explore different integration options for your trained models, such as TensorFlow, ONNX, OpenVINO, CoreML, and PaddlePaddle.
Ultralytics HUB App. Learn about the Ultralytics App for iOS and Android, which allows you to run models directly on your mobile device.
- iOS. Learn about YOLO CoreML models accelerated on Apple's Neural Engine on iPhones and iPads.
- Android. Explore TFLite acceleration on mobile devices.
Inference API. Understand how to use the Inference API for running your trained models in the cloud to generate predictions.

If this is a 🐛 Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix.

If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.

We try to respond to all issues as promptly as possible. Thank you for your patience!

pderrenger · 2024-06-12T01:12:48Z

Hi Tony,

Thank you for reaching out and for your detailed question!

To address your need for handling large 8k images with YOLO while managing memory and latency issues, model parallelism across multiple GPUs is indeed a viable approach. However, implementing this requires careful consideration and setup.

Steps to Achieve Model Parallelism:

Image Tiling: Divide your large image into smaller, manageable tiles. This can be done using libraries like OpenCV or PIL in Python. Each tile can then be processed independently.

Inference on Tiles: Distribute these tiles across your 4 GPUs. You can use PyTorch's torch.nn.DataParallel or torch.nn.parallel.DistributedDataParallel to parallelize the inference process. Here's a basic example using DataParallel:

import torch
from ultralytics import YOLO

# Load your model
model = YOLO('yolov5s.pt')

# Assuming you have a list of image tiles
image_tiles = [tile1, tile2, tile3, tile4]

# Move model to DataParallel
model = torch.nn.DataParallel(model)

# Perform inference on each tile
results = [model(tile) for tile in image_tiles]

# Combine results (stitching)
# This will depend on how you divided your image and your specific use case

Stitching Results: After processing each tile, you'll need to combine the results. This involves adjusting the bounding box coordinates to match the original image dimensions. The stitching logic will depend on how you divided your image.

Important Considerations:

Ensure Latest Versions: Make sure you are using the latest versions of torch, ultralytics, and hub-sdk. This ensures you have the latest features and bug fixes. You can upgrade your packages using:
```
pip install --upgrade torch ultralytics hub-sdk
```
Memory Management: Even with model parallelism, managing GPU memory is crucial. Monitor your GPU usage to avoid out-of-memory errors.
Latency: While parallelism can reduce latency, the overhead of distributing and collecting results might introduce some delay. Profiling your setup can help identify bottlenecks.

If you encounter any issues or need further assistance, please provide a minimum reproducible code example. This will help us better understand your setup and provide more targeted support. You can refer to our minimum reproducible example guide for more details.

Feel free to reach out with any more questions or updates on your progress!

tonyreina added the question A HUB question that does not involve a bug label Jun 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model parallelization for inference on large images #720

Model parallelization for inference on large images #720

tonyreina commented Jun 11, 2024

github-actions bot commented Jun 11, 2024

pderrenger commented Jun 12, 2024

Model parallelization for inference on large images #720

Model parallelization for inference on large images #720

Comments

tonyreina commented Jun 11, 2024

Search before asking

Question

Additional

github-actions bot commented Jun 11, 2024

pderrenger commented Jun 12, 2024

Steps to Achieve Model Parallelism:

Important Considerations: