Output log breakdown
Here we will use the two primary functionalities of the clika-ace
Python package:
- Start ACE - execute compression of a model initialized from an existing
torch.nn.Module
. - Deploy - deploy the compressed model to a chosen framework.
We will analyze and explain the output log generated by these two operations step-by-step, using a toy example.
Full output log
Upon running torch.compile
with backend="clika"
you will see something similar to:
CLIKA: Pre-Compiling
CLIKA: Done Pre-Compiling
CLIKA: Compiling
===============================================================
== License is Valid. Time Left: 390 days 11 hours 30 minutes ==
===============================================================
Created log at: logs/clika_2024-07-11_14:51:12.125294.log
[2024-07-11 14:51:12] CLIKA ACE Version: 24.7.0
[2024-07-11 14:51:12] 'torch' Version: 2.3.1+cu121
[2024-07-11 14:51:12] Python Version: 3.9.19 (main, Mar 21 2024, 17:11:28) [GCC 11.2.0]
[2024-07-11 14:51:12]
Training Settings:
+ stats_steps = 20
+ random_seed = 1534438726
+ is_training_from_scratch = False
Distributed Training Settings:
+ multi_gpu = True
+ use_sharding = True
Global Quantization Settings:
+ quantization_algorithm = Quantization Aware Training
+ weights_num_bits = 8
+ activations_num_bits = 8
+ skip_tail_quantization = False
+ automatic_skip_quantization = True
Deployment Settings:
+ target_framework = TensorRT (NVIDIA)
[2024-07-11 14:51:12] Starting initial model validation: 'VisionTransformer'
[2024-07-11 14:51:12] Starting to parse the model: 'VisionTransformer'
[2024-07-11 14:51:12] Setting the Model into Training Model for Parsing
[2024-07-11 14:51:12] Moving the Model to GPU Device
[2024-07-11 14:51:12] 1/236. Parsing node: 'x'
[2024-07-11 14:51:12] Parsed to -> 'x' (Subgraph: 'VisionTransformer')
[2024-07-11 14:51:12] 2/236. Parsing node: '_1'
[2024-07-11 14:51:12] Parsed to -> 'shape_1' (Subgraph: 'VisionTransformer')
[2024-07-11 14:51:12] 3/236. Parsing node: 'getitem'
[2024-07-11 14:51:12] Parsed to -> 'gather_1' (Subgraph: 'VisionTransformer')
[2024-07-11 14:51:12] 4/236. Parsing node: 'getitem_1'
[2024-07-11 14:51:12] Parsed to -> 'gather_2_1' (Subgraph: 'VisionTransformer')
...
[2024-07-11 14:51:12] 234/236. Parsing node: 'getitem_29'
[2024-07-11 14:51:12] Parsed to -> 'gather_6_1' (Subgraph: 'VisionTransformer')
[2024-07-11 14:51:12] 235/236. Parsing node: 'heads_head'
[2024-07-11 14:51:12] Parsed to -> 'heads_head' (Subgraph: 'VisionTransformer/heads')
[2024-07-11 14:51:12] 236/236. Parsing node: 'output'
[2024-07-11 14:51:13] Removed 72 unnecessary nodes
[2024-07-11 14:51:13] Removed 37 Dropout layers
[2024-07-11 14:51:13] Removed 1 unnecessary nodes
CLIKA: Done Compiling
Calibrating Model: 100%|##########| 20/20 [00:06<00:00, 3.31it/s]
[2024-07-11 14:51:20] Processing Quantization statistics 1/116: 'add'
[2024-07-11 14:51:20] Processing Quantization statistics 2/116: 'add_1'
[2024-07-11 14:51:20] Processing Quantization statistics 3/116: 'add_10'
...
[2024-07-11 14:51:24] Processing Quantization statistics 113/116: 'multihead_attention_9_1'
[2024-07-11 14:51:24] Processing Quantization statistics 114/116: 'permute'
[2024-07-11 14:51:24] Processing Quantization statistics 115/116: 'reshape'
[2024-07-11 14:51:24] Processing Quantization statistics 116/116: 'x'
[2024-07-11 14:51:24] Automatic Quantization: Skipping Quantization for node: 'constant_3_1'
[2024-07-11 14:51:24] Quantization: Processed 1/207 - 'class_token'
[2024-07-11 14:51:25] Quantization: Processed 3/207 - 'encoder_pos_embedding'
[2024-07-11 14:51:25] Quantization: Processed 6/207 - 'x_0_q'
[2024-07-11 14:51:25] Quantization: Processed 10/207 - 'conv_proj'
[2024-07-11 14:51:25] Quantization: Processed 13/207 - 'conv_proj_0_q'
...
[2024-07-11 14:51:39] Quantization: Processed 204/207 - 'encoder_ln'
[2024-07-11 14:51:39] Quantization: Processed 205/207 - 'encoder_ln_0_q'
[2024-07-11 14:51:39] Quantization: Processed 206/207 - 'gather_6_1'
[2024-07-11 14:51:39] Quantization: Processed 207/207 - 'heads_head'
[2024-07-11 14:51:39] Quantization: done
[2024-07-11 14:51:40] Creating a visualization of the Graph. It may take few minutes.
[2024-07-11 14:51:40] Visualization: file exists: 'image_classification/torchvision_models/outputs/vit_b_16/trt/vit_b_16.svg.svg', Overwriting.
[2024-07-11 14:51:40] Saved graph visualization to: image_classification/torchvision_models/outputs/vit_b_16/trt/vit_b_16.svg.svg
[2024-07-11 14:51:47] Saving submodules of the model under:
image_classification/torchvision_models/outputs/vit_b_16/trt/2024_07_11_14_51_47_clika_submodules_m7djj9r3
[W shape_type_inference.cpp:1968] Warning: The shape inference of CLIKA::Placeholder_begin[2713b7cb93a54832898a60ef9078e7af] type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1968] Warning: The shape inference of CLIKA::Placeholder[2713b7cb93a54832898a60ef9078e7af] type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1968] Warning: The shape inference of CLIKA::Placeholder_end[2713b7cb93a54832898a60ef9078e7af] type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[2024-07-11 14:51:53] Saved model at: image_classification/torchvision_models/outputs/vit_b_16/trt/2024_07_11_14_51_47_clika_submodules_m7djj9r3/clika_0.onnx
[2024-07-11 14:51:56] Saved final model: image_classification/torchvision_models/outputs/vit_b_16/trt/vit_b_16_init.onnx
Breakdown of the sample log:
CLIKA: Pre-Compiling
CLIKA: Done Pre-Compiling
CLIKA: Compiling
- Starts the conversion of the given
Model
toCLIKA IR
===============================================================
== License is Valid. Time Left: 390 days 11 hours 30 minutes ==
===============================================================
Created log at: logs/clika_2024-07-11_14:51:12.125294.log
[2024-07-11 14:51:12] CLIKA ACE Version: 24.7.0
[2024-07-11 14:51:12] 'torch' Version: 2.3.1+cu121
[2024-07-11 14:51:12] Python Version: 3.9.19 (main, Mar 21 2024, 17:11:28) [GCC 11.2.0]
[2024-07-11 14:51:12]
Training Settings:
+ stats_steps = 20
+ random_seed = 1534438726
+ is_training_from_scratch = False
Distributed Training Settings:
+ multi_gpu = True
+ use_sharding = True
Global Quantization Settings:
+ quantization_algorithm = Quantization Aware Training
+ weights_num_bits = 8
+ activations_num_bits = 8
+ skip_tail_quantization = False
+ automatic_skip_quantization = True
Deployment Settings:
+ target_framework = TensorRT (NVIDIA)
- Runs a license check as validation
- Initializes the log file
- Prints out the versions and the
Settings
object that was passed to the SDK
[2024-07-11 14:51:12] Starting initial model validation: 'VisionTransformer'
[2024-07-11 14:51:12] Starting to parse the model: 'VisionTransformer'
[2024-07-11 14:51:12] Setting the Model into Training Model for Parsing
[2024-07-11 14:51:12] Moving the Model to GPU Device
[2024-07-11 14:51:12] 1/236. Parsing node: 'x'
[2024-07-11 14:51:12] Parsed to -> 'x' (Subgraph: 'VisionTransformer')
[2024-07-11 14:51:12] 2/236. Parsing node: '_1'
[2024-07-11 14:51:12] Parsed to -> 'shape_1' (Subgraph: 'VisionTransformer')
[2024-07-11 14:51:12] 3/236. Parsing node: 'getitem'
[2024-07-11 14:51:12] Parsed to -> 'gather_1' (Subgraph: 'VisionTransformer')
[2024-07-11 14:51:12] 4/236. Parsing node: 'getitem_1'
[2024-07-11 14:51:12] Parsed to -> 'gather_2_1' (Subgraph: 'VisionTransformer')
...
[2024-07-11 14:51:12] 234/236. Parsing node: 'getitem_29'
[2024-07-11 14:51:12] Parsed to -> 'gather_6_1' (Subgraph: 'VisionTransformer')
[2024-07-11 14:51:12] 235/236. Parsing node: 'heads_head'
[2024-07-11 14:51:12] Parsed to -> 'heads_head' (Subgraph: 'VisionTransformer/heads')
[2024-07-11 14:51:12] 236/236. Parsing node: 'output'
- Resumes conversion to
CLIKA IR
; all of the operations inside the given model are mapped to CLIKA operations. - During the conversion process, the CLIKA SDK attempts to preserve the subgraph in which the operations were originally contained.
[2024-07-11 14:51:13] Removed 72 unnecessary nodes
[2024-07-11 14:51:13] Removed 37 Dropout layers
[2024-07-11 14:51:13] Removed 1 unnecessary nodes
CLIKA: Done Compiling
ACE
does multiple passes over the given Model and eliminates unnecessary nodes. The unnecessary nodes can be:- Unused nodes (layers)
- Useless nodes, e.g.,
Add(0 + x)
- Dropout nodes
- & many more
Calibrating Model: 100%|##########| 20/20 [00:06<00:00, 3.31it/s]
[2024-07-11 14:51:20] Processing Quantization statistics 1/116: 'add'
[2024-07-11 14:51:20] Processing Quantization statistics 2/116: 'add_1'
[2024-07-11 14:51:20] Processing Quantization statistics 3/116: 'add_10'
...
[2024-07-11 14:51:24] Processing Quantization statistics 113/116: 'multihead_attention_9_1'
[2024-07-11 14:51:24] Processing Quantization statistics 114/116: 'permute'
[2024-07-11 14:51:24] Processing Quantization statistics 115/116: 'reshape'
[2024-07-11 14:51:24] Processing Quantization statistics 116/116: 'x'
[2024-07-11 14:51:24] Automatic Quantization: Skipping Quantization for node: 'constant_3_1'
- Assuming calibration data was passed to the
torch.compile(..., backend="clika")
,ACE
will perform an initial quantization calibration. ACE
processes the quantization statistics into model modifications and optimizations.ACE
performs automatic quantization, skipping nodes which are highly sensitive to quantization, or nodes for which quantization is irrelevant.
ACE
will then transform theCLIKA IR
internally into dedicated nodes for quantization relevant to the selected deployment framework:
[2024-07-11 14:51:24] Quantization: Processed 1/207 - 'class_token'
[2024-07-11 14:51:25] Quantization: Processed 3/207 - 'encoder_pos_embedding'
[2024-07-11 14:51:25] Quantization: Processed 6/207 - 'x_0_q'
[2024-07-11 14:51:25] Quantization: Processed 10/207 - 'conv_proj'
[2024-07-11 14:51:25] Quantization: Processed 13/207 - 'conv_proj_0_q'
...
[2024-07-11 14:51:39] Quantization: Processed 204/207 - 'encoder_ln'
[2024-07-11 14:51:39] Quantization: Processed 205/207 - 'encoder_ln_0_q'
[2024-07-11 14:51:39] Quantization: Processed 206/207 - 'gather_6_1'
[2024-07-11 14:51:39] Quantization: Processed 207/207 - 'heads_head'
[2024-07-11 14:51:39] Quantization: done
- After the process above, the
torch.compile
call has finished. If we decide to deploy the model, it will be equivalent to post-training quantization (PTQ) - If we continue for fine-tuning, it'll be equivalent to quantization-aware training (QAT)
Visualization
- In case we call
clika_module.clika_visualize(...)
, the CLIKA SDK will create.svg
file that displays the model.
[2024-07-11 14:51:40] Creating a visualization of the Graph. It may take few minutes.
[2024-07-11 14:51:40] Visualization: file exists: 'vit_b_16.svg.svg', Overwriting.
[2024-07-11 14:51:40] Saved graph visualization to: vit_b_16.svg.svg
Deployment
- The following log is produced when we deploy the model using
torch.onnx.export
- It deploys any
ClikaModule
submodule components into a separate directory, then merges them into one.- The
warning
can be ignored.
- The
[2024-07-11 14:51:47] Saving submodules of the model under: 2024_07_11_14_51_47_clika_submodules_m7djj9r3
[W shape_type_inference.cpp:1968] Warning: The shape inference of CLIKA::Placeholder_begin[2713b7cb93a54832898a60ef9078e7af] type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1968] Warning: The shape inference of CLIKA::Placeholder[2713b7cb93a54832898a60ef9078e7af] type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1968] Warning: The shape inference of CLIKA::Placeholder_end[2713b7cb93a54832898a60ef9078e7af] type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[2024-07-11 14:51:53] Saved model at: 2024_07_11_14_51_47_clika_submodules_m7djj9r3/clika_0.onnx
[2024-07-11 14:51:56] Saved final model: vit_b_16_init.onnx