Skip to main content
Version: 0.3

Output log breakdown

Here we will use the two primary functionalities of the clika-compression Python package:

We will analyze and explain the output log generated by these two operations step-by-step, using a toy example.

info

This is a simple toy example used to showcase the output log. For more complete examples, see our Examples GitHub Repository.

Compression and deployment script

The requirements for this example are:

from functools import partial

import torch
from torch.utils.data import DataLoader
import torchmetrics
import torchvision

from clika_compression import DeploymentSettings_TensorRT_ONNX, QATQuantizationSettings, DistributedTrainingSettings, \
Settings, get_path_to_best_clika_state_result, clika_compress, clika_deploy


class BasicMNIST(torch.nn.Module):
"""
Simple MNIST model

REFERENCE:
https://github.com/pytorch/examples/blob/7f7c222b355abd19ba03a7d4ba90f1092973cdbc/mnist/main.py#L11
"""

def __init__(self):
super().__init__()
self.conv1 = torch.nn.Conv2d(1, 32, 3, 1)
self.conv2 = torch.nn.Conv2d(32, 64, 3, 1)
self.dropout1 = torch.nn.Dropout(0.25)
self.dropout2 = torch.nn.Dropout(0.5)
self.flatten = torch.nn.Flatten()
self.fc1 = torch.nn.Linear(9216, 128)
self.fc2 = torch.nn.Linear(128, 10)

def forward(self, x):
x = self.conv1(x)
x = torch.nn.functional.relu(x)
x = self.conv2(x)
x = torch.nn.functional.relu(x)
x = torch.nn.functional.max_pool2d(
x, kernel_size=(2, 2), stride=None, padding=(0, 0), dilation=(1, 1)
)
x = self.dropout1(x)
x = self.flatten(x)
x = self.fc1(x)
x = torch.nn.functional.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
return x


def loss_fn(output, target):
return torch.nn.functional.nll_loss(
torch.nn.functional.log_softmax(output, dim=1), target
)


def main():
# Model
model = BasicMNIST()

# Loss
compute_loss_fn = loss_fn

# DataLoader
train_loader = torchvision.datasets.MNIST(
root=".", train=True, transform=torchvision.transforms.ToTensor(), download=True
)
eval_loader = torchvision.datasets.MNIST(
root=".",
train=False,
transform=torchvision.transforms.ToTensor(),
download=True,
)
get_train_loader = partial(DataLoader, dataset=train_loader, batch_size=32)
get_eval_loader = partial(DataLoader, dataset=eval_loader, batch_size=32)

# Metric
metric_fn = torchmetrics.classification.MulticlassAccuracy(num_classes=10)

# CLIKA_COMPRESSION
settings = Settings() # Default settings
settings.deployment_settings = DeploymentSettings_TensorRT_ONNX()
settings.global_quantization_settings = QATQuantizationSettings()
settings.distributed_training_settings = DistributedTrainingSettings(
multi_gpu=False, use_sharding=False
)
settings.training_settings.steps_per_epoch = 1000
settings.training_settings.num_epochs = 3

# Force output adjacent nodes in fp32
settings.global_quantization_settings.skip_tail_quantization = True

output_dir = "outputs"
clika_compress(
output_path=output_dir,
settings=settings,
model=model,
init_training_dataset_fn=get_train_loader,
init_evaluation_dataset_fn=get_eval_loader,
optimizer=torch.optim.SGD(params=list(model.parameters()), lr=1e-2),
training_losses={"NLL_loss": compute_loss_fn},
training_metrics={"acc": metric_fn},
evaluation_losses={"NLL_loss": compute_loss_fn},
evaluation_metrics={"acc": metric_fn},
)
best_clika_state_file = get_path_to_best_clika_state_result(
output_dir=output_dir, key_name="acc"
)

# .pompom -> .onnx
deployed_path = clika_deploy(
clika_state_path=best_clika_state_file,
output_dir_path=output_dir,
input_shapes=[(None, 1, 28, 28)],
)
print(f"Deployed model saved at {deployed_path}")


if __name__ == "__main__":
main()

Output log

The log file contains the following sections:

CCO setup


Information pertaining to CCO initialization and setup, as well as information about the working environment:

Created log at: outputs/logs/clika_optimize_2023-11-08_19:53:30.980863.log
[2023-11-08 19:53:30] Initializing output directory at: 'outputs/logs'
CLIKA Version: 0.3.0
'torch' Version: 2.1.2+cu121
Python Version: 3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]
...

Training settings for the CCO specified in the TrainingSettings instance, and used in the initialization of the Settings.global_quantization_settings variable:

Training Settings:
+ num_epochs = 3
+ steps_per_epoch = 1000
+ evaluation_steps = -1
...

Distributed training settings for the CCO specified in the DistributedTrainingSettings instance, and used in the initialization of the Settings.distributed_training_settings variable:

Distributed Training Settings:
+ multi_gpu = False
+ use_sharding = False

Global quantization settings for the CCO specified in the QATQuantizationSettings instance, and used in the initialization of the Settings.global_quantization_settings variable:

Global Quantization Settings:
+ quantization_algorithm = Quantization Aware Training
+ weights_num_bits = 8
+ activations_num_bits = 8
...

Deployment settings for the CCO specified in the DeploymentSettings_TensorRT_ONNX instance, and used in the initialization of the Settings.deployment_settings variable:

Deployment Settings:
+ target_framework = TensorRT (NVIDIA)

Parsed layers named as they will appear in the graph visualization model_init.svg, or as will be used when customizing quantization skipping:

 1/13. Parsing node: 'x'
Parsed to -> 'x' (Subgraph: 'BasicMNIST')
2/13. Parsing node: 'conv1'
Parsed to -> 'conv' (Subgraph: 'BasicMNIST')
3/13. Parsing node: 'relu'
...
Visualization: file exists: outputs/model_init.svg

For more information, see graph visualization.


Evaluation of the original, non-quantized model

If the is_training_from_scratch parameter of TrainingSettings is set to False, the initial, non-quantized model will be evaluated. These evaluation metrics (which can include losses or other, custom metrics) will be used to provide a comparison between the original, non-quantized model and the quantized model over the course of CCO:

Evaluating the Model
Evaluating [100/313] eta: 0:00:00 loss - 2.3036 (2.3060) | NLL_loss - 2.3036 (2.3060)
...
  • eval_loss - the summation of all losses set by the user
  • eval_NLL_loss - a user-provided loss function provided to the clika_compress in the training_losses arguments

The number inside the parentheses, e.g., (2.3060), is the running mean loss.

tip

You may change the logging output frequency by setting the parameter TrainingSettings.print_interval

info

One-step log line breakdown:

This is a breakdown of a single training/evaluation step. The numbers in parentheses are running means

[  <data and time>  ] Epoch #   [<steps completed>/<total steps>]   eta: <time left for epoch>  <total evaluation loss> - <total loss value> | <loss-name-1>  - <current-loss-2-value> | <time it took to fetch data> |    <iteration time elapse>     |       <CPU memory>       |      <GPU memory>

[2024-03-20 20:12:15] Epoch 1 [ 200 / 1000 ] eta: 0:00:04 loss - 1.8831 (1.8862) | NLL_loss - 1.8831 (1.8862) | data-time-sec:0.0009 (0.0010)| iter-time-sec - 0.0057 (0.0057)| mem-GiB - 1.6386 (1.6386)| vmem-GiB - 0.0796 (0.0796)

Performance summary of the original, non-quantized model on the evaluation dataset:

Original Model Base Results (Evaluation Dataset):
Evaluation
├── NLL_loss = 2.308
├── acc = 0.05008
└── loss = 2.308

Preparation of the quantized model

Model pre-processing and running statistics for the initial quantization parameters (Calibration process)

Preprocessing Model
Removing 2 Dropout layers
Collecting Statistics from Model
Running Statistics and Profiling the Model for at most: 20 steps.
Collecting [20/20] eta: 0:00:00

Quantization skipping information:

Skipped Quantization automatically for the Tail of the model:
'linear_1'
Automatic Quantization: Skipping Quantization for node: 'linear_1'

In this example, the last layers of the model, i.e., the "model tail" (here, the layer linear_1 only), were skipped and quantization was not performed. This is the default behavior, which can be customized.

For more information, see quantization guide.


Learning rate warm-up

Starting training.
Starting Warmup.
Warmup [100/500] eta: 0:00:02 loss - 2.3108 (2.2999) | NLL_loss - 2.3108 (2.2999) | data-time-sec - 0.0009 (0.0009) | iter-time-sec - 0.0053 (0.0056) | mem-GiB - 1.6301 (1.6299) | vmem-GiB - 0.0796 (0.0796)
Warmup [200/500] eta: 0:00:01 loss - 2.2815 (2.2862) | NLL_loss - 2.2815 (2.2862) | data-time-sec - 0.0010 (0.0010) | iter-time-sec - 0.0050 (0.0050) | mem-GiB - 1.6325 (1.6324) | vmem-GiB - 0.0796 (0.0796)

In this example, we apply a linear learning rate warm-up.


Training

Starting Epoch 1
Epoch 1 [ 100/1000] eta: 0:00:04 loss - 1.3927 (1.3980) | NLL_loss - 1.3927 (1.3980) | data-time-sec - 0.0009 (0.0009) | iter-time-sec - 0.0049 (0.0049) | mem-GiB - 1.6399 (1.6399) | vmem-GiB - 0.0796 (0.0796)

Performance on the evaluation dataset after each training epoch:

Epoch 1 Total time: 0:00:05
Epoch 1 - Evaluating [100/313] eta: 0:00:00 loss - 0.2798 (0.3647) | NLL_loss - 0.2798 (0.3647) | data-time-sec - 0.0009 (0.0009) | iter-time-sec - 0.0028 (0.0029) | mem-GiB - 1.6515 (1.6515) | vmem-GiB - 0.0478 (0.0478)

Summary of epoch 1 for all losses and metrics:

Summary of Epoch 1
Evaluation
├── NLL_loss = 0.40983
│ ├── Lowest so far = 0.40983 (Epoch: 1)
│ ├── Highest so far = 0.40983 (Epoch: 1)
│ └── Original value = 2.3014
│ └── Difference = ↘ -1.89157,-82.192%
├── acc = 0.90062
│ ├── Lowest so far = 0.90062 (Epoch: 1)
│ ├── Highest so far = 0.90062 (Epoch: 1)
│ └── Original value = 0.10011
│ └── Difference = ↗ +0.80051,+799.659%
└── loss = 0.40983
...
  • Lowest so far - The lowest value achieved among all epochs so far.
  • Highest so far -The highest value achieved among all epochs so far.
  • Original value - The Loss/Metric value obtained from the original model before compression was initiated.
  • Difference - The change or difference in performance compared to the original model.

Output of checkpoints and the current state to a .pompom file and finish compression; this step is the last phase of CCO.

Saved checkpoint at: 'outputs/epoch_3/model.pompom'
Finished Compression.

Deployment

Model from: outputs/epoch_1/model.pompom
model at: outputs/BasicMNIST.onnx

Since we used clika_deploy in our script, the model is deployed to an .onnx file.