Skip to content

Commit

Permalink
Adding hiera (#30356)
Browse files Browse the repository at this point in the history
* initialized Structure

* Updated variable names

* Added Config class, basic HF setup, convert_to_hf

* Fixed Convert function, added hiera to HF files, Initilized test files

* better naming for x in forward pass

* Moved utils to hiera

* Change hiera -> hiera_model

* Fixed integration into tranformers

* Fix: Convert Checkpoint

* added documentation for hiera

* added documentation for hiera

* added Docstings to models, Transformers based changes

* make style and quality

* make style and quality

* Integration & Block tests running

* Fixed bugs

* initialized Structure

* Updated variable names

* Added Config class, basic HF setup, convert_to_hf

* Fixed Convert function, added hiera to HF files, Initilized test files

* better naming for x in forward pass

* Moved utils to hiera

* Change hiera -> hiera_model

* Fixed integration into tranformers

* Fix: Convert Checkpoint

* added documentation for hiera

* added documentation for hiera

* added Docstings to models, Transformers based changes

* make style and quality

* make style and quality

* Integration & Block tests running

* Fixed bugs

* Removed tim dependency

* added HieraBlock

* fixed: Model name

* added tests for HieraModel, HieraBlock

* fixed imports

* fixed quality & copies

* Fixes

* Update docs/source/en/model_doc/hiera.md

Fix name

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/hiera.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update docs/source/en/model_doc/hiera.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/hiera/configuration_hiera.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/hiera/configuration_hiera.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/hiera/modeling_hiera.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Update src/transformers/models/hiera/modeling_hiera.py

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* Fixed formatting

* Code quality & Import differences

* quality and repo-consistency fix

* fixed no torch error

* Docstring fix

* Docstring fix

* doc string fix

* fixed example usage

* Resolved issues in modeling_hiera

* Removed Hiera MAE

* Added test and resolved bug

* fixed doc string

* First commit

* Finished conversion script and model forward working

* Resolved all issues

* nits

* Improving tests

* Nits

* More nits

* Improving HieraForMaskedImageModeling

* More improvements and nits

* Fixed docstrings of outputs

* More fixes

* More imrpovments

* Updated conversion script

* Fixed docstrings

* Improved tests

* Fixed attentou outputs test

* All tests green

* Removed unnecessary file

* contribution attribution

* Resolved a few issues

* Resolved Comments

* Updated model repo id and fixed bugs

* Removed loss print

* Make tests green

* Updated docstrings

* Fix style

* Fixed num_heads in config

* Removed unnecessary video checkpoint related code in the conversion script

* Fix style

* Changed atol in conversion script

* HieraConfig

* Fix copies

* Fixed typo

* Resolved few issues

* make

* converted conv_nd -> nn.Module

* Removed video complexities

* Removed video complexities

* fix style

* Addressing comments

* Update src/transformers/models/hiera/modeling_hiera.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/hiera/modeling_hiera.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Update src/transformers/models/hiera/modeling_hiera.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Fix style

* Fixed tests

* Fixed typo

* Fixed interpolate test

* Made torch fx compatible

* Made sure imageprocesor is correct

* Addressed comments

* Noise directly as torch

* Remove unnecesary attr

* Added return_dit

* Update src/transformers/models/hiera/__init__.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

* Updated checkpoints

* [run_slow] hiera

* Fixed device mismatch

* [run_slow] hiera

* Fixed GPU tests

* [run_slow] hiera

---------

Co-authored-by: Ubuntu <ubuntu@ip-172-31-29-50.us-east-2.compute.internal>
Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
Co-authored-by: Eduardo Pacheco <eduardo.pach@hotmail.com>
Co-authored-by: Eduardo Pacheco <69953243+EduardoPach@users.noreply.github.com>
Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>
  • Loading branch information
6 people authored Jul 11, 2024
1 parent 574e68d commit c1e139c
Show file tree
Hide file tree
Showing 18 changed files with 2,945 additions and 0 deletions.
4 changes: 4 additions & 0 deletions docs/source/en/_toctree.yml
Original file line number Diff line number Diff line change
Expand Up @@ -603,6 +603,8 @@
title: FocalNet
- local: model_doc/glpn
title: GLPN
- local: model_doc/hiera
title: Hiera
- local: model_doc/imagegpt
title: ImageGPT
- local: model_doc/levit
Expand Down Expand Up @@ -680,6 +682,8 @@
title: CLAP
- local: model_doc/encodec
title: EnCodec
- local: model_doc/hiera
title: Hiera
- local: model_doc/hubert
title: Hubert
- local: model_doc/mctct
Expand Down
1 change: 1 addition & 0 deletions docs/source/en/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,7 @@ Flax), PyTorch, and/or TensorFlow.
| [Grounding DINO](model_doc/grounding-dino) ||||
| [GroupViT](model_doc/groupvit) ||||
| [HerBERT](model_doc/herbert) ||||
| [Hiera](model_doc/hiera) ||||
| [Hubert](model_doc/hubert) ||||
| [I-BERT](model_doc/ibert) ||||
| [IDEFICS](model_doc/idefics) ||||
Expand Down
48 changes: 48 additions & 0 deletions docs/source/en/model_doc/hiera.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
<!--Copyright 2024 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
⚠️ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->

# Hiera

## Overview

Hiera was proposed in [Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles](https://arxiv.org/abs/2306.00989) by Chaitanya Ryali, Yuan-Ting Hu, Daniel Bolya, Chen Wei, Haoqi Fan, Po-Yao Huang, Vaibhav Aggarwal, Arkabandhu Chowdhury, Omid Poursaeed, Judy Hoffman, Jitendra Malik, Yanghao Li, Christoph Feichtenhofer

The paper introduces "Hiera," a hierarchical Vision Transformer that simplifies the architecture of modern hierarchical vision transformers by removing unnecessary components without compromising on accuracy or efficiency. Unlike traditional transformers that add complex vision-specific components to improve supervised classification performance, Hiera demonstrates that such additions, often termed "bells-and-whistles," are not essential for high accuracy. By leveraging a strong visual pretext task (MAE) for pretraining, Hiera retains simplicity and achieves superior accuracy and speed both in inference and training across various image and video recognition tasks. The approach suggests that spatial biases required for vision tasks can be effectively learned through proper pretraining, eliminating the need for added architectural complexity.

The abstract from the paper is the following:

*Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance. While these components lead to effective accuracies and attractive FLOP counts, the added complexity actually makes these transformers slower than their vanilla ViT counterparts. In this paper, we argue that this additional bulk is unnecessary. By pretraining with a strong visual pretext task (MAE), we can strip out all the bells-and-whistles from a state-of-the-art multi-stage vision transformer without losing accuracy. In the process, we create Hiera, an extremely simple hierarchical vision transformer that is more accurate than previous models while being significantly faster both at inference and during training. We evaluate Hiera on a variety of tasks for image and video recognition. Our code and models are available at https://github.com/facebookresearch/hiera.*

This model was a joint contibution by [EduardoPacheco](https://huggingface.co/EduardoPacheco) and [namangarg110](https://huggingface.co/namangarg110). The original code can be found [here] (https://github.com/facebookresearch/hiera).

## HieraConfig

[[autodoc]] HieraConfig

## HieraModel

[[autodoc]] HieraModel
- forward

## HieraForPreTraining

[[autodoc]] HieraForPreTraining
- forward

## HieraForImageClassification

[[autodoc]] HieraForImageClassification
- forward
18 changes: 18 additions & 0 deletions src/transformers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -462,6 +462,7 @@
"GroupViTVisionConfig",
],
"models.herbert": ["HerbertTokenizer"],
"models.hiera": ["HieraConfig"],
"models.hubert": ["HubertConfig"],
"models.ibert": ["IBertConfig"],
"models.idefics": ["IdeficsConfig"],
Expand Down Expand Up @@ -2285,6 +2286,15 @@
"GroupViTVisionModel",
]
)
_import_structure["models.hiera"].extend(
[
"HieraBackbone",
"HieraForImageClassification",
"HieraForPreTraining",
"HieraModel",
"HieraPreTrainedModel",
]
)
_import_structure["models.hubert"].extend(
[
"HubertForCTC",
Expand Down Expand Up @@ -5112,6 +5122,7 @@
GroupViTVisionConfig,
)
from .models.herbert import HerbertTokenizer
from .models.hiera import HieraConfig
from .models.hubert import HubertConfig
from .models.ibert import IBertConfig
from .models.idefics import (
Expand Down Expand Up @@ -6795,6 +6806,13 @@
GroupViTTextModel,
GroupViTVisionModel,
)
from .models.hiera import (
HieraBackbone,
HieraForImageClassification,
HieraForPreTraining,
HieraModel,
HieraPreTrainedModel,
)
from .models.hubert import (
HubertForCTC,
HubertForSequenceClassification,
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,7 @@
grounding_dino,
groupvit,
herbert,
hiera,
hubert,
ibert,
idefics,
Expand Down
2 changes: 2 additions & 0 deletions src/transformers/models/auto/configuration_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,7 @@
("graphormer", "GraphormerConfig"),
("grounding-dino", "GroundingDinoConfig"),
("groupvit", "GroupViTConfig"),
("hiera", "HieraConfig"),
("hubert", "HubertConfig"),
("ibert", "IBertConfig"),
("idefics", "IdeficsConfig"),
Expand Down Expand Up @@ -403,6 +404,7 @@
("grounding-dino", "Grounding DINO"),
("groupvit", "GroupViT"),
("herbert", "HerBERT"),
("hiera", "Hiera"),
("hubert", "Hubert"),
("ibert", "I-BERT"),
("idefics", "IDEFICS"),
Expand Down
1 change: 1 addition & 0 deletions src/transformers/models/auto/image_processing_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@
("glpn", ("GLPNImageProcessor",)),
("grounding-dino", ("GroundingDinoImageProcessor",)),
("groupvit", ("CLIPImageProcessor",)),
("hiera", ("BitImageProcessor",)),
("idefics", ("IdeficsImageProcessor",)),
("idefics2", ("Idefics2ImageProcessor",)),
("imagegpt", ("ImageGPTImageProcessor",)),
Expand Down
5 changes: 5 additions & 0 deletions src/transformers/models/auto/modeling_auto.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@
("graphormer", "GraphormerModel"),
("grounding-dino", "GroundingDinoModel"),
("groupvit", "GroupViTModel"),
("hiera", "HieraModel"),
("hubert", "HubertModel"),
("ibert", "IBertModel"),
("idefics", "IdeficsModel"),
Expand Down Expand Up @@ -295,6 +296,7 @@
("gpt2", "GPT2LMHeadModel"),
("gpt_bigcode", "GPTBigCodeForCausalLM"),
("gptsan-japanese", "GPTSanJapaneseForConditionalGeneration"),
("hiera", "HieraForPreTraining"),
("ibert", "IBertForMaskedLM"),
("idefics", "IdeficsForVisionText2Text"),
("idefics2", "Idefics2ForConditionalGeneration"),
Expand Down Expand Up @@ -535,6 +537,7 @@
("efficientnet", "EfficientNetModel"),
("focalnet", "FocalNetModel"),
("glpn", "GLPNModel"),
("hiera", "HieraModel"),
("imagegpt", "ImageGPTModel"),
("levit", "LevitModel"),
("mobilenet_v1", "MobileNetV1Model"),
Expand Down Expand Up @@ -610,6 +613,7 @@
),
("efficientnet", "EfficientNetForImageClassification"),
("focalnet", "FocalNetForImageClassification"),
("hiera", "HieraForImageClassification"),
("imagegpt", "ImageGPTForImageClassification"),
(
"levit",
Expand Down Expand Up @@ -1258,6 +1262,7 @@
("dinat", "DinatBackbone"),
("dinov2", "Dinov2Backbone"),
("focalnet", "FocalNetBackbone"),
("hiera", "HieraBackbone"),
("maskformer-swin", "MaskFormerSwinBackbone"),
("nat", "NatBackbone"),
("pvt_v2", "PvtV2Backbone"),
Expand Down
59 changes: 59 additions & 0 deletions src/transformers/models/hiera/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import TYPE_CHECKING

from ...utils import (
OptionalDependencyNotAvailable,
_LazyModule,
is_torch_available,
)


_import_structure = {"configuration_hiera": ["HieraConfig"]}

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
_import_structure["modeling_hiera"] = [
"HieraForImageClassification",
"HieraForPreTraining",
"HieraBackbone",
"HieraModel",
"HieraPreTrainedModel",
]

if TYPE_CHECKING:
from .configuration_hiera import HieraConfig

try:
if not is_torch_available():
raise OptionalDependencyNotAvailable()
except OptionalDependencyNotAvailable:
pass
else:
from .modeling_hiera import (
HieraBackbone,
HieraForImageClassification,
HieraForPreTraining,
HieraModel,
HieraPreTrainedModel,
)

else:
import sys

sys.modules[__name__] = _LazyModule(__name__, globals()["__file__"], _import_structure, module_spec=__spec__)
Loading

0 comments on commit c1e139c

Please sign in to comment.