Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model parallelization for inference on large images #720

Open
1 task done
tonyreina opened this issue Jun 11, 2024 · 2 comments
Open
1 task done

Model parallelization for inference on large images #720

tonyreina opened this issue Jun 11, 2024 · 2 comments
Labels
question A HUB question that does not involve a bug

Comments

@tonyreina
Copy link

Search before asking

Question

I've got images that are 8k and larger in size. I'd like to use the entire image in the Yolo model but of course that leads to memory and latency issues in the inference.

Is anyone aware of doing inference for Yolo where they employ model parallelism across say 4 GPUs (NVLinked on same node) to handle 4 sections of the image (or 4 sections of the model) simultaneously and then stich them back together at the end?

Is there anyway to do something like this with Ultralytics?

Thanks.
Best.
-Tony

Additional

No response

@tonyreina tonyreina added the question A HUB question that does not involve a bug label Jun 11, 2024
Copy link

👋 Hello @tonyreina, thank you for raising an issue about Ultralytics HUB 🚀! Please visit our HUB Docs to learn more:

  • Quickstart. Start training and deploying YOLO models with HUB in seconds.
  • Datasets: Preparing and Uploading. Learn how to prepare and upload your datasets to HUB in YOLO format.
  • Projects: Creating and Managing. Group your models into projects for improved organization.
  • Models: Training and Exporting. Train YOLOv5 and YOLOv8 models on your custom datasets and export them to various formats for deployment.
  • Integrations. Explore different integration options for your trained models, such as TensorFlow, ONNX, OpenVINO, CoreML, and PaddlePaddle.
  • Ultralytics HUB App. Learn about the Ultralytics App for iOS and Android, which allows you to run models directly on your mobile device.
    • iOS. Learn about YOLO CoreML models accelerated on Apple's Neural Engine on iPhones and iPads.
    • Android. Explore TFLite acceleration on mobile devices.
  • Inference API. Understand how to use the Inference API for running your trained models in the cloud to generate predictions.

If this is a 🐛 Bug Report, please provide screenshots and steps to reproduce your problem to help us get started working on a fix.

If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.

We try to respond to all issues as promptly as possible. Thank you for your patience!

@pderrenger
Copy link
Member

Hi Tony,

Thank you for reaching out and for your detailed question!

To address your need for handling large 8k images with YOLO while managing memory and latency issues, model parallelism across multiple GPUs is indeed a viable approach. However, implementing this requires careful consideration and setup.

Steps to Achieve Model Parallelism:

  1. Image Tiling: Divide your large image into smaller, manageable tiles. This can be done using libraries like OpenCV or PIL in Python. Each tile can then be processed independently.

  2. Inference on Tiles: Distribute these tiles across your 4 GPUs. You can use PyTorch's torch.nn.DataParallel or torch.nn.parallel.DistributedDataParallel to parallelize the inference process. Here's a basic example using DataParallel:

    import torch
    from ultralytics import YOLO
    
    # Load your model
    model = YOLO('yolov5s.pt')
    
    # Assuming you have a list of image tiles
    image_tiles = [tile1, tile2, tile3, tile4]
    
    # Move model to DataParallel
    model = torch.nn.DataParallel(model)
    
    # Perform inference on each tile
    results = [model(tile) for tile in image_tiles]
    
    # Combine results (stitching)
    # This will depend on how you divided your image and your specific use case
  3. Stitching Results: After processing each tile, you'll need to combine the results. This involves adjusting the bounding box coordinates to match the original image dimensions. The stitching logic will depend on how you divided your image.

Important Considerations:

  • Ensure Latest Versions: Make sure you are using the latest versions of torch, ultralytics, and hub-sdk. This ensures you have the latest features and bug fixes. You can upgrade your packages using:

    pip install --upgrade torch ultralytics hub-sdk
  • Memory Management: Even with model parallelism, managing GPU memory is crucial. Monitor your GPU usage to avoid out-of-memory errors.

  • Latency: While parallelism can reduce latency, the overhead of distributing and collecting results might introduce some delay. Profiling your setup can help identify bottlenecks.

If you encounter any issues or need further assistance, please provide a minimum reproducible code example. This will help us better understand your setup and provide more targeted support. You can refer to our minimum reproducible example guide for more details.

Feel free to reach out with any more questions or updates on your progress!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question A HUB question that does not involve a bug
Projects
None yet
Development

No branches or pull requests

2 participants