Skip to content

Minimal code and examnples for inferencing Sapiens foundation human models in Pytorch

License

Notifications You must be signed in to change notification settings

ibaiGorordo/Sapiens-Pytorch-Inference

Repository files navigation

Sapiens-Pytorch-Inference

Minimal code and examples for inferencing Sapiens foundation human models in Pytorch

ONNX Sapiens_normal_segmentation

Why

  • Make it easy to run the models by creating a SapiensPredictor class that allows to run multiple tasks simultaneously
  • Add several examples to run the models on images, videos, and with a webcam in real-time.
  • Download models automatically from HuggigFace if not available locally.
  • Add a script for ONNX export. However, ONNX inference is not recommended due to the slow speed.
  • Added Object Detection to allow the model to be run for each detected person. However, this mode is disabled as it produces the worst results.

Caution

  • Use 1B models, since the accuracy of lower models is not good (especially for segmentation)
  • Exported ONNX models are too slow.
  • Input sizes other than 768x1024 don't produce good results.
  • Running Sapiens models on a cropped person produces worse results, even if you crop a wider rectangle around the person.

Installation PyPI

pip install sapiens-inferece

Or, clone this repository:

git clone https://github.com/ibaiGorordo/Sapiens-Pytorch-Inference.git
cd Sapiens-Pytorch-Inference
pip install -r requirements.txt

Usage

import cv2
from imread_from_url import imread_from_url
from sapiens_inference import SapiensPredictor, SapiensConfig, SapiensDepthType, SapiensNormalType

# Load the model
config = SapiensConfig()
config.depth_type = SapiensDepthType.DEPTH_03B  # Disabled by default
config.normal_type = SapiensNormalType.NORMAL_1B  # Disabled by default
predictor = SapiensPredictor(config)

# Load the image
img = imread_from_url("https://github.com/ibaiGorordo/Sapiens-Pytorch-Inference/blob/assets/test2.png?raw=true")

# Estimate the maps
result = predictor(img)

cv2.namedWindow("Combined", cv2.WINDOW_NORMAL)
cv2.imshow("Combined", result)
cv2.waitKey(0)

SapiensPredictor

The SapiensPredictor class allows to run multiple tasks simultaneously. It has the following methods:

  • SapiensPredictor(config: SapiensConfig) - Load the model with the specified configuration.
  • __call__(img: np.ndarray) -> np.ndarray - Estimate the maps for the input image.

SapiensConfig

The SapiensConfig class allows to configure the model. It has the following attributes:

  • dtype: torch.dtype - Data type to use. Default: torch.float32.
  • device: torch.device - Device to use. Default: cuda if available, otherwise cpu.
  • depth_type: SapiensDepthType - Depth model to use. Options: OFF, DEPTH_03B, DEPTH_06B, DEPTH_1B, DEPTH_2B. Default: OFF.
  • normal_type: SapiensNormalType - Normal model to use. Options: OFF, NORMAL_03B, NORMAL_06B, NORMAL_1B, NORMAL_2B. Default: OFF.
  • segmentation_type: SapiensSegmentationType - Segmentation model to use (Always enabled for the mask). Options: SEGMENTATION_03B, SEGMENTATION_06B, SEGMENTATION_1B. Default: SEGMENTATION_1B.
  • detector_config: DetectorConfig - Configuration for the object detector. Default: {model_path: str = "models/yolov8m.pt", person_id: int = 0, confidence: float = 0.25}. Disabled as it produces worst results.
  • minimum_person_height: float - Minimum height ratio of the person to detect. Default: 0.5f (50%). Not used if the object detector is disabled.

Examples

  • Image Sapiens Predictor (Normal, Depth, Segmentation):
python image_predictor.py

sapiens_human_model

python video_predictor.py
  • Webcam Sapiens Predictor (Normal, Depth, Segmentation):
python webcam_predictor.py
  • Image Normal Estimation:
python image_normal_estimation.py
  • Image Human Part Segmentation:
python image_segmentation.py
  • Video Normal Estimation:
python video_normal_estimation.py
  • Video Human Part Segmentation:
python video_segmentation.py
  • Webcam Normal Estimation:
python webcam_normal_estimation.py
  • Webcam Human Part Segmentation:
python webcam_segmentation.py

Export to ONNX

To export the model to ONNX, run the following script:

python export_onnx.py seg03b

The available models are seg03b, seg06b, seg1b, depth03b, depth06b, depth1b, depth2b, normal03b, normal06b, normal1b, normal2b.

Original Models

The original models are available at HuggingFace: https://huggingface.co/facebook/sapiens/tree/main/sapiens_lite_host

References