Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can I get an output of image coordinates of the detected objects with unique id and corresponding frame number in a text file or excel file? #108

Closed
1 task done
zsefvbhu opened this issue Oct 28, 2022 · 10 comments
Assignees
Labels
enhancement New feature or request Stale

Comments

@zsefvbhu
Copy link

zsefvbhu commented Oct 28, 2022

Search before asking

  • I have searched the HUB issues and found no similar feature requests.

Description

Can I get an output of image coordinates of the detected objects with a unique id and corresponding frame number in a text file or excel file? ClearML gives the output of training results, but I want output of detections detected in a video.

Use case

No response

Additional

No response

@zsefvbhu zsefvbhu added the enhancement New feature or request label Oct 28, 2022
@github-actions
Copy link

👋 Hello @zsefvbhu, thank you for raising an issue about Ultralytics HUB 🚀! Please visit https://ultralytics.com/hub to learn more, and see our ⭐️ HUB Guidelines to quickly get started uploading datasets and training YOLOv5 models.

If this is a 🐛 Bug Report, please provide screenshots and steps to recreate your problem to help us get started working on a fix.

If this is a ❓ Question, please provide as much information as possible, including dataset, model, environment details etc. so that we might provide the most helpful response.

We try to respond to all issues as promptly as possible. Thank you for your patience!

@kalenmike
Copy link
Contributor

@zsefvbhu Thanks for the question. Currently the only way would be to split your video into frames and run inference on each frame. This sounds like an interesting task and I will look into it more, perhaps we can handle everything for you on the server and just send back the results from your uploaded video.

@kalenmike kalenmike self-assigned this Oct 31, 2022
@glenn-jocher
Copy link
Member

glenn-jocher commented Oct 31, 2022

@zsefvbhu you can't do this in HUB yet, but you can download your HUB model and use it with the YOLOv5 repo, i.e. with detect.py for videos using --save-txt to output to text file, or using PyTorch Hub inference for the most flexibility (recommended).

YOLOv5 🚀 PyTorch Hub models allow for simple model loading and inference in a pure python environment without using detect.py.

Simple Inference Example

This example loads a pretrained YOLOv5s model from PyTorch Hub as model and passes an image for inference. 'yolov5s' is the YOLOv5 'small' model. For details on all available models please see the README. Custom models can also be loaded, including custom trained PyTorch models and their exported variants, i.e. ONNX, TensorRT, TensorFlow, OpenVINO YOLOv5 models.

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')  # yolov5n - yolov5x6 official model
#                                            'custom', 'path/to/best.pt')  # custom model

# Images
im = 'https://ultralytics.com/images/zidane.jpg'  # or file, Path, URL, PIL, OpenCV, numpy, list

# Inference
results = model(im)

# Results
results.print()  # or .show(), .save(), .crop(), .pandas(), etc.
results.xyxy[0]  # im predictions (tensor)

results.pandas().xyxy[0]  # im predictions (pandas)
#      xmin    ymin    xmax   ymax  confidence  class    name
# 0  749.50   43.50  1148.0  704.5    0.874023      0  person
# 2  114.75  195.75  1095.0  708.0    0.624512      0  person
# 3  986.00  304.00  1028.0  420.0    0.286865     27     tie

results.pandas().xyxy[0].value_counts('name')  # class counts (pandas)
# person    2
# tie       1

See YOLOv5 PyTorch Hub Tutorial for details.

Good luck 🍀 and let us know if you have any other questions!

@zsefvbhu
Copy link
Author

zsefvbhu commented Nov 3, 2022

How can i run this model in a video file?

@kalenmike
Copy link
Contributor

kalenmike commented Nov 3, 2022

@zsefvbhu Using YOLOv5 you just need to use the correct arguments. In addition pass in your weights.

python detect.py --source 0  # webcam
                          img.jpg  # image
                          vid.mp4  # video
                          screen  # screenshot
                          path/  # directory
                          'path/*.jpg'  # glob
                          'https://youtu.be/Zgi9g1ksQHc'  # YouTube
                          'rtsp://example.com/media.mp4'  # RTSP, RTMP, HTTP stream

@zsefvbhu
Copy link
Author

zsefvbhu commented Nov 3, 2022

no loop through frame is required? result = model(vid)??

@kalenmike
Copy link
Contributor

@zsefvbhu Sorry I misunderstood you, this will generate frames with the bounding boxes overlayed. It will not return the data for plotting.

@zsefvbhu
Copy link
Author

zsefvbhu commented Nov 3, 2022

Ok, I actually need those pixel coordinates like Xmin, Ymin, Xmax, Ymax for those detection bounding boxes

@github-actions
Copy link

github-actions bot commented Dec 4, 2022

👋 Hello, this issue has been automatically marked as stale because it has not had recent activity. Please note it will be closed if no further activity occurs.

Access additional YOLOv5 🚀 resources:

Access additional Ultralytics ⚡ resources:

Feel free to inform us of any other issues you discover or feature requests that come to mind in the future. Pull Requests (PRs) are also always welcomed!

Thank you for your contributions to YOLOv5 🚀 and Vision AI ⭐!

@github-actions github-actions bot added the Stale label Dec 4, 2022
@github-actions github-actions bot closed this as completed Dec 9, 2022
@UltralyticsAssistant
Copy link
Member

@zsefvbhu you can run inference on a video file with the YOLOv5 PyTorch Hub model to obtain the bounding box coordinates for each detection. After loading your model using the torch.hub.load() method, you can process the video by iterating through each frame, running inference, and extracting the results.xyxy tensor, which contains the bounding box coordinates.

For processing the video, you would typically use either OpenCV or a similar library to read the video frame by frame, perform inference on each frame using the model, and then collect the results. The results.xyxy tensor for each frame will give you the bounding boxes with the [xmin, ymin, xmax, ymax, confidence, class] format. You can then save these results to a text file or convert them to a format suitable for Excel.

Remember that iterating through video frames and processing them is not a feature directly provided by YOLOv5's PyTorch Hub interface but can be accomplished with additional coding. You have to extract and handle the video frames and the loop mechanism in your script. The results.xyxy tensor will provide the bounding box coordinates that you require.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Stale
Projects
None yet
Development

No branches or pull requests

4 participants