Common api prepare_for_inference for controllers (openvinotoolkit#1612)

### Changes Add `prepare_for_inference` to common api for PyTorch orch and TF Controllers. PyTorch implementation: openvinotoolkit#1526 1. Move `prepare_for_inference` logics to `strip_model` method for PT. (before strip_model do nothing, get model and return same model) 2. Add make_model_copy argument to `strip_model`, defaults value is False. 3. Add to `copy_model` from `nncf/common/utils/backend.py` logic to copy TF model via TFModelTransformer. (deepcopy, tf.keras.model_copy do not work as expected) 4. Removed `infer_backend_from_compression_controller` function because was import cycle and `infer_backend_from_compression_controller` is longer than use `get_backend`. ```python infer_backend_from_compression_controller(compression_ctrl) # Changed to get_bakend(compression_ctrl.model) ``` 5. In torch examples for `--prepare_for_inference` used export via `torch.onnx.export` 6. Add TF examples with export models without using `ctrl.export_model` method. ### Related tickets 92247
vshampor · Mar 21, 2023 · 32fe3ab · 32fe3ab
1 parent af4a4d2
commit 32fe3ab
Show file tree

Hide file tree

Showing 41 changed files with 599 additions and 351 deletions.
diff --git a/docs/Usage.md b/docs/Usage.md
@@ -85,36 +85,54 @@ Important points you should consider when training your networks with compressio
   - Turn off the `Dropout` layers (and similar ones like `DropConnect`) when training a network with quantization or sparsity
   - It is better to turn off additional regularization in the loss function (for example, L2 regularization via `weight_decay`) when training the network with RB sparsity, since it already imposes an L0 regularization term.
 
-#### Step 4 (optional): Export the compressed model to ONNX
-After the compressed model has been fine-tuned to acceptable accuracy and compression stages, you can export it to ONNX format.
-Since export process is in general algorithm-specific, you have to call the compression controller's `export_model` method to properly export the model with compression specifics into ONNX:
-```python
-compression_ctrl.export_model("./compressed_model.onnx")
-```
-The exported ONNX file may contain special, non-ONNX-standard operations and layers to leverage full compressed/low-precision potential of the OpenVINO toolkit.
-In some cases it is possible to export a compressed model with ONNX standard operations only (so that it can be run using `onnxruntime`, for example) - this is the case for the 8-bit symmetric quantization and sparsity/filter pruning algorithms.
-Refer to [compression algorithm documentation](./compression_algorithms) for details.
+#### Step 4: Export the compressed model
+After the compressed model has been fine-tuned to acceptable accuracy and compression stages, you can export it. There are two ways to export a model:
+
+1. Call the compression controller's `export_model` method to properly export the model with compression specifics into ONNX.
+
+    ```python
+    compression_ctrl.export_model("./compressed_model.onnx")
+    ```
+    The exported ONNX file may contain special, non-ONNX-standard operations and layers to leverage full compressed/low-precision potential of the OpenVINO toolkit.
+    In some cases it is possible to export a compressed model with ONNX standard operations only (so that it can be run using `onnxruntime`, for example) - this is the case for the 8-bit symmetric quantization and sparsity/filter pruning algorithms.
+    Refer to [compression algorithm documentation](./compression_algorithms) for details.
+    Also, this method is limited to the supported formats for export.
+
+2. Call the compression controller's `prepare_for_inference` method, to properly get the model without NNCF specific
+    nodes for training compressed model, after that you can trace the model via inference in framework operations.
+    It gives more flexibility to deploy model after optimization. As well as this method also allows you to connect
+    third-party inference solutions, like OpenVINO.
+
+    ```python
+    inference_model = compression_ctrl.prepare_for_inference()
+    # To ONNX format
+    import torch
+    torch.onnx.export(inference_model, dummy_input, './compressed_model.onnx')
+    # To OpenVINO format
+    from openvino.tools import mo
+    ov_model = mo.convert_model(inference_model, example_input=example_input)
+    ```
 
 ## Saving and loading compressed models
 The complete information about compression is defined by a compressed model and a compression state.
-The model characterizes the weights and topology of the network. The compression state - how to restore the setting of 
+The model characterizes the weights and topology of the network. The compression state - how to restore the setting of
 compression layers in the model and how to restore the compression schedule and the compression loss.
-The latter can be obtained by `compression_ctrl.get_compression_state()` on saving and passed to the 
-`create_compressed_model` helper function by the optional `compression_state` argument on loading. 
+The latter can be obtained by `compression_ctrl.get_compression_state()` on saving and passed to the
+`create_compressed_model` helper function by the optional `compression_state` argument on loading.
 The compressed model should be loaded once it's created.
 
-Saving and loading of the compressed model and compression state is framework-specific and can be done in an arbitrary 
+Saving and loading of the compressed model and compression state is framework-specific and can be done in an arbitrary
 way. NNCF provides one possible way of doing it with helper functions in samples.
 
 To save the best compressed checkpoint use `compression_ctrl.compression_stage()` to distinguish between 3 possible
-levels of compression: `UNCOMPRESSED`, `PARTIALLY_COMPRESSED` and `FULLY_COMPRESSED`. It is useful in case of `staged` 
-compression. Model may achieve the best accuracy on earlier stages of compression - tuning without compression or with 
-intermediate compression rate, but still fully compressed model with lower accuracy should be considered as the best 
-compressed one. `UNCOMPRESSED` means that no compression is applied for the model, for instance, in case of stage 
-quantization - when all quantization are disabled, or in case of sparsity - when current sparsity rate is zero. 
+levels of compression: `UNCOMPRESSED`, `PARTIALLY_COMPRESSED` and `FULLY_COMPRESSED`. It is useful in case of `staged`
+compression. Model may achieve the best accuracy on earlier stages of compression - tuning without compression or with
+intermediate compression rate, but still fully compressed model with lower accuracy should be considered as the best
+compressed one. `UNCOMPRESSED` means that no compression is applied for the model, for instance, in case of stage
+quantization - when all quantization are disabled, or in case of sparsity - when current sparsity rate is zero.
 `PARTIALLY_COMPRESSED` stands for the compressed model which haven't reached final compression ratio yet, e.g. magnitude
-sparsity algorithm has learnt masking of 30% weights out of 51% of target rate. The controller returns 
-`FULLY_COMPRESSED` compression stage when it finished scheduling and tuning hyper parameters of the compression 
+sparsity algorithm has learnt masking of 30% weights out of 51% of target rate. The controller returns
+`FULLY_COMPRESSED` compression stage when it finished scheduling and tuning hyper parameters of the compression
 algorithm, for example when rb-sparsity method sets final target sparsity rate for the loss.
 
 ### Saving and loading compressed models in TensorFlow
@@ -147,8 +165,8 @@ checkpoint = tf.train.Checkpoint(model=compress_model,
 checkpoint.restore(path_to_checkpoint)
 ```
 
-Since the compression state is a dictionary of Python JSON-serializable objects, we convert it to JSON 
-string within `tf.train.Checkpoint`. There are 2 helper classes: `TFCompressionState` - for saving compression state and 
+Since the compression state is a dictionary of Python JSON-serializable objects, we convert it to JSON
+string within `tf.train.Checkpoint`. There are 2 helper classes: `TFCompressionState` - for saving compression state and
 `TFCompressionStateLoader` - for loading.
 
 ### Saving and loading compressed models in PyTorch
@@ -167,7 +185,7 @@ torch.save(checkpoint, path)
 
 # load part
 resuming_checkpoint = torch.load(path)
-state_dict = resuming_checkpoint['state_dict'] 
+state_dict = resuming_checkpoint['state_dict']
 compression_ctrl, compressed_model = create_compressed_model(model, nncf_config, resuming_state_dict=state_dict)
 compression_ctrl.scheduler.load_state(resuming_checkpoint['scheduler_state'])
 ```
@@ -185,39 +203,39 @@ torch.save(checkpoint, path)
 
 # load part
 resuming_checkpoint = torch.load(path)
-compression_state = resuming_checkpoint['compression_state'] 
+compression_state = resuming_checkpoint['compression_state']
 compression_ctrl, compressed_model = create_compressed_model(model, nncf_config, compression_state=compression_state)
-state_dict = resuming_checkpoint['state_dict'] 
+state_dict = resuming_checkpoint['state_dict']
 
 # load model in a preferable way
-    load_state(compressed_model, state_dict, is_resume=True)     
-    # or when execution mode on loading is the same as on saving: 
-    # save and load in a single GPU mode or save and load in the (Distributed)DataParallel one, not in a mixed way  
+    load_state(compressed_model, state_dict, is_resume=True)
+    # or when execution mode on loading is the same as on saving:
+    # save and load in a single GPU mode or save and load in the (Distributed)DataParallel one, not in a mixed way
     compressed_model.load_state_dict(state_dict)
 ```
 
-You can save the `compressed_model` object `torch.save` as usual: via `state_dict` and `load_state_dict` methods. 
-Alternatively, you can use the `nncf.load_state` function on loading. It will attempt to load a PyTorch state dict into 
-a model by first stripping the irrelevant prefixes, such as `module.` or `nncf_module.`, from both the checkpoint and 
+You can save the `compressed_model` object `torch.save` as usual: via `state_dict` and `load_state_dict` methods.
+Alternatively, you can use the `nncf.load_state` function on loading. It will attempt to load a PyTorch state dict into
+a model by first stripping the irrelevant prefixes, such as `module.` or `nncf_module.`, from both the checkpoint and
 the model layer identifiers, and then do the matching between the layers.
-Depending on the value of the `is_resume` argument, it will then fail if an exact match could not be made 
-(when `is_resume == True`), or load the matching layer parameters and print a warning listing the mismatches 
-(when `is_resume == False`). `is_resume=False` is most commonly used if you want to load the starting weights from an 
-uncompressed model into a compressed model and `is_resume=True` is used when you want to evaluate a compressed 
+Depending on the value of the `is_resume` argument, it will then fail if an exact match could not be made
+(when `is_resume == True`), or load the matching layer parameters and print a warning listing the mismatches
+(when `is_resume == False`). `is_resume=False` is most commonly used if you want to load the starting weights from an
+uncompressed model into a compressed model and `is_resume=True` is used when you want to evaluate a compressed
 checkpoint or resume compressed checkpoint training without changing the compression algorithm parameters.
 
 The compression state can be directly pickled by `torch.save` as well, since it is a dictionary of Python objects.
 
-In the previous releases of the NNCF, model can be loaded without compression state information 
-by saving the model state dictionary `compressed_model.state_dict` and loading it via `nncf.load_state` and  
-`compressed_model.load_state_dict` methods or using optional `resuming_state_dict` argument of the 
+In the previous releases of the NNCF, model can be loaded without compression state information
+by saving the model state dictionary `compressed_model.state_dict` and loading it via `nncf.load_state` and
+`compressed_model.load_state_dict` methods or using optional `resuming_state_dict` argument of the
 `create_compressed_model`.
 This way of loading is deprecated, and we highly recommend to not use this way as it does not guarantee the exact loading
-of compression model state for algorithms with sophisticated initialization - e.g. HAWQ and AutoQ. 
+of compression model state for algorithms with sophisticated initialization - e.g. HAWQ and AutoQ.
 Also in this case, keep in mind that in order to load the resulting checkpoint file the `compressed_model` object should
 have the same structure with regard to PyTorch module and parameters as it was when the checkpoint was saved.
-In practice this means that you should use the same compression algorithms (i.e. the same NNCF configuration file) when 
-loading a compressed model checkpoint. 
+In practice this means that you should use the same compression algorithms (i.e. the same NNCF configuration file) when
+loading a compressed model checkpoint.
 
 
 ## Exploring the compressed model
@@ -271,8 +289,8 @@ from nncf.common.accuracy_aware_training import create_accuracy_aware_training_l
 training_loop = create_accuracy_aware_training_loop(nncf_config, compression_ctrl)
 ```
 
-In order to properly instantiate the accuracy-aware training loop, the user has to specify the 'accuracy_aware_training' section. 
-This section fully depends on what Accuracy-Aware Training loop is being used. 
+In order to properly instantiate the accuracy-aware training loop, the user has to specify the 'accuracy_aware_training' section.
+This section fully depends on what Accuracy-Aware Training loop is being used.
 For more details about config of Adaptive Compression Level Training refer to [Adaptive Compression Level Training documentation](./accuracy_aware_model_training/AdaptiveCompressionTraining.md) and Early Exit Training refer to [Early Exit Training documentation](./accuracy_aware_model_training/EarlyExitTraining.md).
 
 The training loop is launched by calling its `run` method. Before the start of the training loop, the user is expected to define several functions related to the training of the model and pass them as arguments to the `run` method of the training loop instance:
@@ -318,9 +336,9 @@ def configure_optimizers_fn():
 def dump_checkpoint_fn(model, compression_controller, accuracy_aware_runner, save_dir):
     '''
     An (optional) function that allows a user to define how to save the model's checkpoint.
-    Training loop will call this function instead own dump_checkpoint function and pass 
+    Training loop will call this function instead own dump_checkpoint function and pass
     `model`, `compression_controller`, `accuracy_aware_runner` and `save_dir` to it as arguments.
-    The user can save the states of the objects according to their own needs. 
+    The user can save the states of the objects according to their own needs.
     `save_dir` is a directory that Accuracy-Aware pipeline created to store log information.
     '''
 ```
@@ -334,4 +352,4 @@ model = training_loop.run(model,
                           configure_optimizers_fn=configure_optimizers_fn,
                           dump_checkpoint_fn=dump_checkpoint_fn)
 ```
-The above call executes the acccuracy-aware training loop and return the compressed model. For more details on how to use the accuracy-aware training loop functionality of NNCF, please refer to its [documentation](./accuracy_aware_model_training/AdaptiveCompressionTraining.md).
+The above call executes the accuracy-aware training loop and return the compressed model. For more details on how to use the accuracy-aware training loop functionality of NNCF, please refer to its [documentation](./accuracy_aware_model_training/AdaptiveCompressionTraining.md).
diff --git a/examples/tensorflow/classification/README.md b/examples/tensorflow/classification/README.md
@@ -11,7 +11,7 @@ This sample demonstrates a DL model compression in case of the Image Classificat
 
 ## Installation
 
-At this point it is assumed that you have already installed nncf. You can find information on downloading nncf [here](https://github.com/openvinotoolkit/nncf#user-content-installation).  
+At this point it is assumed that you have already installed nncf. You can find information on downloading nncf [here](https://github.com/openvinotoolkit/nncf#user-content-installation).
 
 To work with the sample you should install the corresponding Python package dependencies:
 
@@ -55,15 +55,15 @@ The ImageNet dataset in TFRecords format should be specified in the configuratio
 
 #### Test Pretrained Model
 
-Before compressing a model, it is highly recommended checking the accuracy of the pretrained model. All models which are supported in the sample has pretrained weights for ImageNet. 
+Before compressing a model, it is highly recommended checking the accuracy of the pretrained model. All models which are supported in the sample has pretrained weights for ImageNet.
 
 To load pretrained weights into a model and then evaluate the accuracy of that model, make sure that the pretrained=True option is set in the configuration file and use the following command:
 ```bash
 python main.py \
 --mode=test \
 --config=configs/quantization/mobilenet_v2_imagenet_int8.json \
 --data=<path_to_imagenet_dataset> \
---disable-compression 
+--disable-compression
 ```
 
 #### Compress Pretrained Model

diff --git a/examples/tensorflow/classification/main.py b/examples/tensorflow/classification/main.py
@@ -11,42 +11,42 @@
  limitations under the License.
 """
 
-import sys
 import os.path as osp
+import sys
 from pathlib import Path
 
 import tensorflow as tf
 import tensorflow_addons as tfa
 
-from examples.tensorflow.common.experimental_patcher import patch_if_experimental_quantization
-from nncf.config.utils import is_accuracy_aware_training
-from nncf.tensorflow.helpers.model_creation import create_compressed_model
-from nncf.tensorflow import create_compression_callbacks
-from nncf.tensorflow.helpers.model_manager import TFModelManager
-from nncf.tensorflow.initialization import register_default_init_args
-from nncf.tensorflow.utils.state import TFCompressionState
-from nncf.tensorflow.utils.state import TFCompressionStateLoader
-
+from examples.common.sample_config import create_sample_config
 from examples.tensorflow.classification.datasets.builder import DatasetBuilder
 from examples.tensorflow.common.argparser import get_common_argument_parser
 from examples.tensorflow.common.callbacks import get_callbacks
 from examples.tensorflow.common.callbacks import get_progress_bar
 from examples.tensorflow.common.distributed import get_distribution_strategy
+from examples.tensorflow.common.experimental_patcher import patch_if_experimental_quantization
+from examples.tensorflow.common.export import export_model
 from examples.tensorflow.common.logger import logger
 from examples.tensorflow.common.model_loader import get_model
 from examples.tensorflow.common.optimizer import build_optimizer
-from examples.common.sample_config import create_sample_config
 from examples.tensorflow.common.scheduler import build_scheduler
+from examples.tensorflow.common.utils import SummaryWriter
+from examples.tensorflow.common.utils import close_strategy_threadpool
 from examples.tensorflow.common.utils import configure_paths
 from examples.tensorflow.common.utils import create_code_snapshot
 from examples.tensorflow.common.utils import get_saving_parameters
 from examples.tensorflow.common.utils import print_args
-from examples.tensorflow.common.utils import serialize_config
 from examples.tensorflow.common.utils import serialize_cli_args
-from examples.tensorflow.common.utils import write_metrics
-from examples.tensorflow.common.utils import SummaryWriter
-from examples.tensorflow.common.utils import close_strategy_threadpool
+from examples.tensorflow.common.utils import serialize_config
 from examples.tensorflow.common.utils import set_seed
+from examples.tensorflow.common.utils import write_metrics
+from nncf.config.utils import is_accuracy_aware_training
+from nncf.tensorflow import create_compression_callbacks
+from nncf.tensorflow.helpers.model_creation import create_compressed_model
+from nncf.tensorflow.helpers.model_manager import TFModelManager
+from nncf.tensorflow.initialization import register_default_init_args
+from nncf.tensorflow.utils.state import TFCompressionState
+from nncf.tensorflow.utils.state import TFCompressionStateLoader
 
 
 def get_argument_parser():
@@ -288,7 +288,9 @@ def run(config):
     logger.info('evaluation...')
     statistics = compression_ctrl.statistics()
     logger.info(statistics.to_str())
-    results = compress_model.evaluate(
+    eval_model = compress_model
+
+    results = eval_model.evaluate(
         validation_dataset,
         steps=validation_steps,
         callbacks=[get_progress_bar(
@@ -300,7 +302,7 @@ def run(config):
 
     if 'export' in config.mode:
         save_path, save_format = get_saving_parameters(config)
-        compression_ctrl.export_model(save_path, save_format)
+        export_model(compression_ctrl.prepare_for_inference(), save_path, save_format)
         logger.info('Saved to {}'.format(save_path))
 
     close_strategy_threadpool(strategy)
@@ -338,7 +340,7 @@ def export(config):
                         ckpt_path=config.ckpt_path)
 
     save_path, save_format = get_saving_parameters(config)
-    compression_ctrl.export_model(save_path, save_format)
+    export_model(compression_ctrl.prepare_for_inference(), save_path, save_format)
     logger.info('Saved to {}'.format(save_path))