Version: 0.3

Output log breakdown

Here we will use the two primary functionalities of the clika-compression Python package:

Start CCO - execute compression of a model initialized from an existing torch.nn.Module.
Deploy - deploy the compressed model to a chosen framework.

We will analyze and explain the output log generated by these two operations step-by-step, using a toy example.

info

This is a simple toy example used to showcase the output log. For more complete examples, see our Examples GitHub Repository.

Compression and deployment script

The requirements for this example are:

clika_compress, follow the installation docs
TorchMetrics, pip install torchmetrics==1.3.2

from functools import partial

import torch
from torch.utils.data import DataLoader
import torchmetrics
import torchvision

from clika_compression import DeploymentSettings_TensorRT_ONNX, QATQuantizationSettings, DistributedTrainingSettings, \
  Settings, get_path_to_best_clika_state_result, clika_compress, clika_deploy


class BasicMNIST(torch.nn.Module):
    """
    Simple MNIST model
    
    REFERENCE:
    https://github.com/pytorch/examples/blob/7f7c222b355abd19ba03a7d4ba90f1092973cdbc/mnist/main.py#L11
    """
    
    def __init__(self):
        super().__init__()
        self.conv1 = torch.nn.Conv2d(1, 32, 3, 1)
        self.conv2 = torch.nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = torch.nn.Dropout(0.25)
        self.dropout2 = torch.nn.Dropout(0.5)
        self.flatten = torch.nn.Flatten()
        self.fc1 = torch.nn.Linear(9216, 128)
        self.fc2 = torch.nn.Linear(128, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = torch.nn.functional.relu(x)
        x = self.conv2(x)
        x = torch.nn.functional.relu(x)
        x = torch.nn.functional.max_pool2d(
            x, kernel_size=(2, 2), stride=None, padding=(0, 0), dilation=(1, 1)
        )
        x = self.dropout1(x)
        x = self.flatten(x)
        x = self.fc1(x)
        x = torch.nn.functional.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        return x


def loss_fn(output, target):
    return torch.nn.functional.nll_loss(
        torch.nn.functional.log_softmax(output, dim=1), target
    )


def main():
    # Model
    model = BasicMNIST()
    
    # Loss
    compute_loss_fn = loss_fn
    
    # DataLoader
    train_loader = torchvision.datasets.MNIST(
        root=".", train=True, transform=torchvision.transforms.ToTensor(), download=True
    )
    eval_loader = torchvision.datasets.MNIST(
        root=".",
        train=False,
        transform=torchvision.transforms.ToTensor(),
        download=True,
    )
    get_train_loader = partial(DataLoader, dataset=train_loader, batch_size=32)
    get_eval_loader = partial(DataLoader, dataset=eval_loader, batch_size=32)
    
    # Metric
    metric_fn = torchmetrics.classification.MulticlassAccuracy(num_classes=10)
    
    # CLIKA_COMPRESSION
    settings = Settings()  # Default settings
    settings.deployment_settings = DeploymentSettings_TensorRT_ONNX()
    settings.global_quantization_settings = QATQuantizationSettings()
    settings.distributed_training_settings = DistributedTrainingSettings(
        multi_gpu=False, use_sharding=False
    )
    settings.training_settings.steps_per_epoch = 1000
    settings.training_settings.num_epochs = 3

    # Force output adjacent nodes in fp32
    settings.global_quantization_settings.skip_tail_quantization = True
    
    output_dir = "outputs"
    clika_compress(
        output_path=output_dir,
        settings=settings,
        model=model,
        init_training_dataset_fn=get_train_loader,
        init_evaluation_dataset_fn=get_eval_loader,
        optimizer=torch.optim.SGD(params=list(model.parameters()), lr=1e-2),
        training_losses={"NLL_loss": compute_loss_fn},
        training_metrics={"acc": metric_fn},
        evaluation_losses={"NLL_loss": compute_loss_fn},
        evaluation_metrics={"acc": metric_fn},
    )
    best_clika_state_file = get_path_to_best_clika_state_result(
        output_dir=output_dir, key_name="acc"
    )
    
    # .pompom -> .onnx
    deployed_path = clika_deploy(
        clika_state_path=best_clika_state_file,
        output_dir_path=output_dir,
        input_shapes=[(None, 1, 28, 28)],
    )
    print(f"Deployed model saved at {deployed_path}")


if __name__ == "__main__":
    main()

Output log

The log file contains the following sections:

CCO setup

Information pertaining to CCO initialization and setup, as well as information about the working environment:

Created log at: outputs/logs/clika_optimize_2023-11-08_19:53:30.980863.log
[2023-11-08 19:53:30] Initializing output directory at: 'outputs/logs'
CLIKA Version: 0.3.0
'torch' Version: 2.1.2+cu121
Python Version: 3.8.16 (default, Mar  2 2023, 03:21:46) [GCC 11.2.0]
...

Training settings for the CCO specified in the TrainingSettings instance, and used in the initialization of the Settings.global_quantization_settings variable:

Training Settings:
    +                              num_epochs = 3
    +                         steps_per_epoch = 1000
    +                        evaluation_steps = -1
...

Distributed training settings for the CCO specified in the DistributedTrainingSettings instance, and used in the initialization of the Settings.distributed_training_settings variable:

Distributed Training Settings:
    +    multi_gpu = False
    + use_sharding = False

Global quantization settings for the CCO specified in the QATQuantizationSettings instance, and used in the initialization of the Settings.global_quantization_settings variable:

Global Quantization Settings:
    +      quantization_algorithm = Quantization Aware Training
    +            weights_num_bits = 8
    +        activations_num_bits = 8
...

Deployment settings for the CCO specified in the DeploymentSettings_TensorRT_ONNX instance, and used in the initialization of the Settings.deployment_settings variable:

Deployment Settings:
    + target_framework = TensorRT (NVIDIA)

Parsed layers named as they will appear in the graph visualization model_init.svg, or as will be used when customizing quantization skipping:

 1/13. Parsing node: 'x'
        Parsed to -> 'x' (Subgraph: 'BasicMNIST')
 2/13. Parsing node: 'conv1'
        Parsed to -> 'conv' (Subgraph: 'BasicMNIST')
 3/13. Parsing node: 'relu'
...
Visualization: file exists: outputs/model_init.svg

For more information, see graph visualization.

Evaluation of the original, non-quantized model

If the is_training_from_scratch parameter of TrainingSettings is set to False, the initial, non-quantized model will be evaluated. These evaluation metrics (which can include losses or other, custom metrics) will be used to provide a comparison between the original, non-quantized model and the quantized model over the course of CCO:

Evaluating the Model
Evaluating  [100/313]   eta: 0:00:00    loss - 2.3036 (2.3060) | NLL_loss - 2.3036 (2.3060)
...

eval_loss - the summation of all losses set by the user
eval_NLL_loss - a user-provided loss function provided to the clika_compress in the training_losses arguments

The number inside the parentheses, e.g., (2.3060), is the running mean loss.

tip

You may change the logging output frequency by setting the parameter TrainingSettings.print_interval

info

One-step log line breakdown:

This is a breakdown of a single training/evaluation step. The numbers in parentheses are running means

[  <data and time>  ] Epoch #   [<steps completed>/<total steps>]   eta: <time left for epoch>  <total evaluation loss> - <total loss value> | <loss-name-1>  - <current-loss-2-value> | <time it took to fetch data> |    <iteration time elapse>     |       <CPU memory>       |      <GPU memory>

[2024-03-20 20:12:15] Epoch 1   [       200       /     1000    ]   eta:       0:00:04                   loss           -    1.8831 (1.8862) |    NLL_loss    -   1.8831    (1.8862)   | data-time-sec:0.0009 (0.0010)| iter-time-sec - 0.0057 (0.0057)| mem-GiB - 1.6386 (1.6386)| vmem-GiB - 0.0796 (0.0796)

Performance summary of the original, non-quantized model on the evaluation dataset:

Original Model Base Results (Evaluation Dataset):
Evaluation
├── NLL_loss = 2.308
├── acc = 0.05008
└── loss = 2.308

Preparation of the quantized model

Model pre-processing and running statistics for the initial quantization parameters (Calibration process)

Preprocessing Model
Removing 2 Dropout layers
Collecting Statistics from Model
Running Statistics and Profiling the Model for at most: 20 steps.
Collecting  [20/20] eta: 0:00:00    

Quantization skipping information:

Skipped Quantization automatically for the Tail of the model:
    'linear_1'
Automatic Quantization: Skipping Quantization for node: 'linear_1'

In this example, the last layers of the model, i.e., the "model tail" (here, the layer linear_1 only), were skipped and quantization was not performed. This is the default behavior, which can be customized.

For more information, see quantization guide.

Learning rate warm-up

Starting training.
Starting Warmup.
Warmup  [100/500]   eta: 0:00:02    loss - 2.3108 (2.2999) | NLL_loss - 2.3108 (2.2999) | data-time-sec - 0.0009 (0.0009) | iter-time-sec - 0.0053 (0.0056) | mem-GiB - 1.6301 (1.6299) | vmem-GiB - 0.0796 (0.0796)
Warmup  [200/500]   eta: 0:00:01    loss - 2.2815 (2.2862) | NLL_loss - 2.2815 (2.2862) | data-time-sec - 0.0010 (0.0010) | iter-time-sec - 0.0050 (0.0050) | mem-GiB - 1.6325 (1.6324) | vmem-GiB - 0.0796 (0.0796)

In this example, we apply a linear learning rate warm-up.

Training

Starting Epoch 1
Epoch 1     [ 100/1000] eta: 0:00:04    loss - 1.3927 (1.3980) | NLL_loss - 1.3927 (1.3980) | data-time-sec - 0.0009 (0.0009) | iter-time-sec - 0.0049 (0.0049) | mem-GiB - 1.6399 (1.6399) | vmem-GiB - 0.0796 (0.0796)

Performance on the evaluation dataset after each training epoch:

Epoch 1 Total time: 0:00:05
Epoch 1 - Evaluating    [100/313]   eta: 0:00:00    loss - 0.2798 (0.3647) | NLL_loss - 0.2798 (0.3647) | data-time-sec - 0.0009 (0.0009) | iter-time-sec - 0.0028 (0.0029) | mem-GiB - 1.6515 (1.6515) | vmem-GiB - 0.0478 (0.0478)

Summary of epoch 1 for all losses and metrics:

Summary of Epoch 1
Evaluation
├── NLL_loss = 0.40983
│   ├── Lowest so far = 0.40983 (Epoch: 1)
│   ├── Highest so far = 0.40983 (Epoch: 1)
│   └── Original value = 2.3014
│       └── Difference = ↘ -1.89157,-82.192%
├── acc = 0.90062
│   ├── Lowest so far = 0.90062 (Epoch: 1)
│   ├── Highest so far = 0.90062 (Epoch: 1)
│   └── Original value = 0.10011
│       └── Difference = ↗ +0.80051,+799.659%
└── loss = 0.40983
...

Lowest so far - The lowest value achieved among all epochs so far.
Highest so far -The highest value achieved among all epochs so far.
Original value - The Loss/Metric value obtained from the original model before compression was initiated.
Difference - The change or difference in performance compared to the original model.

Output of checkpoints and the current state to a .pompom file and finish compression; this step is the last phase of CCO.

Saved checkpoint at: 'outputs/epoch_3/model.pompom'
Finished Compression.

Deployment

Model from: outputs/epoch_1/model.pompom
model at: outputs/BasicMNIST.onnx

Since we used clika_deploy in our script, the model is deployed to an .onnx file.

Output log breakdown

Compression and deployment script​

Output log​

CCO setup​

Evaluation of the original, non-quantized model​

Preparation of the quantized model​

Learning rate warm-up​

Training​

Deployment​