Version: 24.8.0

Output log breakdown

Here we will use the two primary functionalities of the clika-ace Python package:

Start ACE - execute compression of a model initialized from an existing torch.nn.Module.
Deploy - deploy the compressed model to a chosen framework.

We will analyze and explain the output log generated by these two operations step-by-step, using a toy example.

Full output log

Upon running torch.compile with backend="clika" you will see something similar to:

CLIKA: Pre-Compiling
CLIKA: Done Pre-Compiling
CLIKA: Compiling
===============================================================
== License is Valid. Time Left: 390 days 11 hours 30 minutes ==
===============================================================
Created log at: logs/clika_2024-07-11_14:51:12.125294.log
[2024-07-11 14:51:12] CLIKA ACE Version: 24.7.0
[2024-07-11 14:51:12] 'torch' Version: 2.3.1+cu121
[2024-07-11 14:51:12] Python Version: 3.9.19 (main, Mar 21 2024, 17:11:28) [GCC 11.2.0]
[2024-07-11 14:51:12]
Training Settings:
+              stats_steps = 20
+              random_seed = 1534438726
+ is_training_from_scratch = False

Distributed Training Settings:
+    multi_gpu = True
+ use_sharding = True

Global Quantization Settings:
+      quantization_algorithm = Quantization Aware Training
+            weights_num_bits = 8
+        activations_num_bits = 8
+      skip_tail_quantization = False
+ automatic_skip_quantization = True

Deployment Settings:
+ target_framework = TensorRT (NVIDIA)

[2024-07-11 14:51:12] Starting initial model validation: 'VisionTransformer'
[2024-07-11 14:51:12] Starting to parse the model: 'VisionTransformer'
[2024-07-11 14:51:12] Setting the Model into Training Model for Parsing
[2024-07-11 14:51:12] Moving the Model to GPU Device
[2024-07-11 14:51:12]   1/236. Parsing node: 'x'
[2024-07-11 14:51:12]       Parsed to -> 'x' (Subgraph: 'VisionTransformer')
[2024-07-11 14:51:12]   2/236. Parsing node: '_1'
[2024-07-11 14:51:12]       Parsed to -> 'shape_1' (Subgraph: 'VisionTransformer')
[2024-07-11 14:51:12]   3/236. Parsing node: 'getitem'
[2024-07-11 14:51:12]       Parsed to -> 'gather_1' (Subgraph: 'VisionTransformer')
[2024-07-11 14:51:12]   4/236. Parsing node: 'getitem_1'
[2024-07-11 14:51:12]       Parsed to -> 'gather_2_1' (Subgraph: 'VisionTransformer')
...
[2024-07-11 14:51:12] 234/236. Parsing node: 'getitem_29'
[2024-07-11 14:51:12]       Parsed to -> 'gather_6_1' (Subgraph: 'VisionTransformer')
[2024-07-11 14:51:12] 235/236. Parsing node: 'heads_head'
[2024-07-11 14:51:12]       Parsed to -> 'heads_head' (Subgraph: 'VisionTransformer/heads')
[2024-07-11 14:51:12] 236/236. Parsing node: 'output'
[2024-07-11 14:51:13] Removed 72 unnecessary nodes
[2024-07-11 14:51:13] Removed 37 Dropout layers
[2024-07-11 14:51:13] Removed 1 unnecessary nodes
CLIKA: Done Compiling
Calibrating Model: 100%|##########| 20/20 [00:06<00:00,  3.31it/s]
[2024-07-11 14:51:20] Processing Quantization statistics   1/116: 'add'
[2024-07-11 14:51:20] Processing Quantization statistics   2/116: 'add_1'
[2024-07-11 14:51:20] Processing Quantization statistics   3/116: 'add_10'
...
[2024-07-11 14:51:24] Processing Quantization statistics 113/116: 'multihead_attention_9_1'
[2024-07-11 14:51:24] Processing Quantization statistics 114/116: 'permute'
[2024-07-11 14:51:24] Processing Quantization statistics 115/116: 'reshape'
[2024-07-11 14:51:24] Processing Quantization statistics 116/116: 'x'
[2024-07-11 14:51:24] Automatic Quantization: Skipping Quantization for node: 'constant_3_1'
[2024-07-11 14:51:24] Quantization: Processed   1/207 - 'class_token'
[2024-07-11 14:51:25] Quantization: Processed   3/207 - 'encoder_pos_embedding'
[2024-07-11 14:51:25] Quantization: Processed   6/207 - 'x_0_q'
[2024-07-11 14:51:25] Quantization: Processed  10/207 - 'conv_proj'
[2024-07-11 14:51:25] Quantization: Processed  13/207 - 'conv_proj_0_q'
...
[2024-07-11 14:51:39] Quantization: Processed 204/207 - 'encoder_ln'
[2024-07-11 14:51:39] Quantization: Processed 205/207 - 'encoder_ln_0_q'
[2024-07-11 14:51:39] Quantization: Processed 206/207 - 'gather_6_1'
[2024-07-11 14:51:39] Quantization: Processed 207/207 - 'heads_head'
[2024-07-11 14:51:39] Quantization: done
[2024-07-11 14:51:40] Creating a visualization of the Graph. It may take few minutes.
[2024-07-11 14:51:40] Visualization: file exists: 'image_classification/torchvision_models/outputs/vit_b_16/trt/vit_b_16.svg.svg', Overwriting.
[2024-07-11 14:51:40] Saved graph visualization to: image_classification/torchvision_models/outputs/vit_b_16/trt/vit_b_16.svg.svg
[2024-07-11 14:51:47] Saving submodules of the model under:
image_classification/torchvision_models/outputs/vit_b_16/trt/2024_07_11_14_51_47_clika_submodules_m7djj9r3
[W shape_type_inference.cpp:1968] Warning: The shape inference of CLIKA::Placeholder_begin[2713b7cb93a54832898a60ef9078e7af] type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1968] Warning: The shape inference of CLIKA::Placeholder[2713b7cb93a54832898a60ef9078e7af] type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1968] Warning: The shape inference of CLIKA::Placeholder_end[2713b7cb93a54832898a60ef9078e7af] type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[2024-07-11 14:51:53] Saved model at: image_classification/torchvision_models/outputs/vit_b_16/trt/2024_07_11_14_51_47_clika_submodules_m7djj9r3/clika_0.onnx
[2024-07-11 14:51:56] Saved final model: image_classification/torchvision_models/outputs/vit_b_16/trt/vit_b_16_init.onnx

Breakdown of the sample log:

CLIKA: Pre-Compiling
CLIKA: Done Pre-Compiling
CLIKA: Compiling

Starts the conversion of the given Model to CLIKA IR

===============================================================
== License is Valid. Time Left: 390 days 11 hours 30 minutes ==
===============================================================
Created log at: logs/clika_2024-07-11_14:51:12.125294.log
[2024-07-11 14:51:12] CLIKA ACE Version: 24.7.0
[2024-07-11 14:51:12] 'torch' Version: 2.3.1+cu121
[2024-07-11 14:51:12] Python Version: 3.9.19 (main, Mar 21 2024, 17:11:28) [GCC 11.2.0]
[2024-07-11 14:51:12]
Training Settings:
+              stats_steps = 20
+              random_seed = 1534438726
+ is_training_from_scratch = False

Distributed Training Settings:
+    multi_gpu = True
+ use_sharding = True

Global Quantization Settings:
+      quantization_algorithm = Quantization Aware Training
+            weights_num_bits = 8
+        activations_num_bits = 8
+      skip_tail_quantization = False
+ automatic_skip_quantization = True

Deployment Settings:
+ target_framework = TensorRT (NVIDIA)

Runs a license check as validation
Initializes the log file
Prints out the versions and the Settings object that was passed to the SDK

[2024-07-11 14:51:12] Starting initial model validation: 'VisionTransformer'
[2024-07-11 14:51:12] Starting to parse the model: 'VisionTransformer'
[2024-07-11 14:51:12] Setting the Model into Training Model for Parsing
[2024-07-11 14:51:12] Moving the Model to GPU Device
[2024-07-11 14:51:12]   1/236. Parsing node: 'x'
[2024-07-11 14:51:12]       Parsed to -> 'x' (Subgraph: 'VisionTransformer')
[2024-07-11 14:51:12]   2/236. Parsing node: '_1'
[2024-07-11 14:51:12]       Parsed to -> 'shape_1' (Subgraph: 'VisionTransformer')
[2024-07-11 14:51:12]   3/236. Parsing node: 'getitem'
[2024-07-11 14:51:12]       Parsed to -> 'gather_1' (Subgraph: 'VisionTransformer')
[2024-07-11 14:51:12]   4/236. Parsing node: 'getitem_1'
[2024-07-11 14:51:12]       Parsed to -> 'gather_2_1' (Subgraph: 'VisionTransformer')
...
[2024-07-11 14:51:12] 234/236. Parsing node: 'getitem_29'
[2024-07-11 14:51:12]       Parsed to -> 'gather_6_1' (Subgraph: 'VisionTransformer')
[2024-07-11 14:51:12] 235/236. Parsing node: 'heads_head'
[2024-07-11 14:51:12]       Parsed to -> 'heads_head' (Subgraph: 'VisionTransformer/heads')
[2024-07-11 14:51:12] 236/236. Parsing node: 'output'

Resumes conversion to CLIKA IR; all of the operations inside the given model are mapped to CLIKA operations.
During the conversion process, the CLIKA SDK attempts to preserve the subgraph in which the operations were originally contained.

[2024-07-11 14:51:13] Removed 72 unnecessary nodes
[2024-07-11 14:51:13] Removed 37 Dropout layers
[2024-07-11 14:51:13] Removed 1 unnecessary nodes
CLIKA: Done Compiling

ACE does multiple passes over the given Model and eliminates unnecessary nodes. The unnecessary nodes can be:
- Unused nodes (layers)
- Useless nodes, e.g., Add(0 + x)
- Dropout nodes
- & many more

Calibrating Model: 100%|##########| 20/20 [00:06<00:00,  3.31it/s]
[2024-07-11 14:51:20] Processing Quantization statistics   1/116: 'add'
[2024-07-11 14:51:20] Processing Quantization statistics   2/116: 'add_1'
[2024-07-11 14:51:20] Processing Quantization statistics   3/116: 'add_10'
...
[2024-07-11 14:51:24] Processing Quantization statistics 113/116: 'multihead_attention_9_1'
[2024-07-11 14:51:24] Processing Quantization statistics 114/116: 'permute'
[2024-07-11 14:51:24] Processing Quantization statistics 115/116: 'reshape'
[2024-07-11 14:51:24] Processing Quantization statistics 116/116: 'x'
[2024-07-11 14:51:24] Automatic Quantization: Skipping Quantization for node: 'constant_3_1'

Assuming calibration data was passed to the torch.compile(..., backend="clika"), ACE will perform an initial quantization calibration.
ACE processes the quantization statistics into model modifications and optimizations.
ACE performs automatic quantization, skipping nodes which are highly sensitive to quantization, or nodes for which quantization is irrelevant.

ACE will then transform the CLIKA IR internally into dedicated nodes for quantization relevant to the selected deployment framework:

[2024-07-11 14:51:24] Quantization: Processed   1/207 - 'class_token'
[2024-07-11 14:51:25] Quantization: Processed   3/207 - 'encoder_pos_embedding'
[2024-07-11 14:51:25] Quantization: Processed   6/207 - 'x_0_q'
[2024-07-11 14:51:25] Quantization: Processed  10/207 - 'conv_proj'
[2024-07-11 14:51:25] Quantization: Processed  13/207 - 'conv_proj_0_q'
...
[2024-07-11 14:51:39] Quantization: Processed 204/207 - 'encoder_ln'
[2024-07-11 14:51:39] Quantization: Processed 205/207 - 'encoder_ln_0_q'
[2024-07-11 14:51:39] Quantization: Processed 206/207 - 'gather_6_1'
[2024-07-11 14:51:39] Quantization: Processed 207/207 - 'heads_head'
[2024-07-11 14:51:39] Quantization: done

After the process above, the torch.compile call has finished. If we decide to deploy the model, it will be equivalent to post-training quantization (PTQ)
If we continue for fine-tuning, it'll be equivalent to quantization-aware training (QAT)

Visualization

In case we call clika_module.clika_visualize(...), the CLIKA SDK will create .svg file that displays the model.

[2024-07-11 14:51:40] Creating a visualization of the Graph. It may take few minutes.
[2024-07-11 14:51:40] Visualization: file exists: 'vit_b_16.svg.svg', Overwriting.
[2024-07-11 14:51:40] Saved graph visualization to: vit_b_16.svg.svg

Deployment

The following log is produced when we deploy the model using torch.onnx.export
It deploys any ClikaModule submodule components into a separate directory, then merges them into one.
- The warning can be ignored.

[2024-07-11 14:51:47] Saving submodules of the model under: 2024_07_11_14_51_47_clika_submodules_m7djj9r3
[W shape_type_inference.cpp:1968] Warning: The shape inference of CLIKA::Placeholder_begin[2713b7cb93a54832898a60ef9078e7af] type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1968] Warning: The shape inference of CLIKA::Placeholder[2713b7cb93a54832898a60ef9078e7af] type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[W shape_type_inference.cpp:1968] Warning: The shape inference of CLIKA::Placeholder_end[2713b7cb93a54832898a60ef9078e7af] type is missing, so it may result in wrong shape inference for the exported graph. Please consider adding it in symbolic function. (function UpdateReliable)
[2024-07-11 14:51:53] Saved model at: 2024_07_11_14_51_47_clika_submodules_m7djj9r3/clika_0.onnx
[2024-07-11 14:51:56] Saved final model: vit_b_16_init.onnx

Output log breakdown

Full output log​

Breakdown of the sample log:​

Visualization​

Deployment​

Full output log

Breakdown of the sample log:

Visualization

Deployment