Version: 0.2

Output Log Breakdown

Here we will use the two primary methods of the clika-compression (cc) Python package:

Start CCO - execute compression of a model initialized from an existing torch.nn.Module.
Deploy - deploy the compressed model to a chosen framework.

We will analyze and explain the output log generated by these two operations step-by-step, using a toy example.

caution

This is a simple toy example used to showcase the output log. For more complete examples, see our Examples GitHub Repository.

Compression and Deployment Script

import torch
import torchmetrics
import torchvision
from clika_compression import PyTorchCompressionEngine, DeploymentSettings_TensorRT_ONNX
from clika_compression.settings import generate_default_settings, ModelCompileSettings

# https://github.com/pytorch/examples/blob/7f7c222b355abd19ba03a7d4ba90f1092973cdbc/mnist/main.py#L11
class BasicMNIST(torch.nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = torch.nn.Conv2d(1, 32, 3, 1)
        self.conv2 = torch.nn.Conv2d(32, 64, 3, 1)
        self.dropout1 = torch.nn.Dropout(0.25)
        self.dropout2 = torch.nn.Dropout(0.5)
        self.flatten = torch.nn.Flatten()
        self.fc1 = torch.nn.Linear(9216, 128)
        self.fc2 = torch.nn.Linear(128, 10)

    def forward(self, x):
        x = self.conv1(x)
        x = torch.nn.functional.relu(x)
        x = self.conv2(x)
        x = torch.nn.functional.relu(x)
        x = torch.nn.functional.max_pool2d(x, kernel_size=(2, 2), stride=None, padding=(0, 0), dilation=(1, 1))
        x = self.dropout1(x)
        x = self.flatten(x)
        x = self.fc1(x)
        x = torch.nn.functional.relu(x)
        x = self.dropout2(x)
        x = self.fc2(x)
        return x


def main():
    model = BasicMNIST()
    compute_loss_fn = lambda a, b: torch.nn.functional.nll_loss(torch.nn.functional.log_softmax(a, dim=1), b)
    train_loader = torchvision.datasets.MNIST(root='.', train=True, transform=torchvision.transforms.ToTensor(), download=True)
    eval_loader = torchvision.datasets.MNIST(root='.', train=False, transform=torchvision.transforms.ToTensor(), download=True)
    get_train_loader = lambda: torch.utils.data.DataLoader(train_loader, batch_size=32)
    get_eval_loader = lambda: torch.utils.data.DataLoader(eval_loader, batch_size=32)
    metric_fn = torchmetrics.classification.MulticlassAccuracy(num_classes=10)
    settings = generate_default_settings()
    settings.deployment_settings = DeploymentSettings_TensorRT_ONNX(graph_author="CLIKA",
                                                                    graph_description=None,
                                                                    input_shapes_for_deployment=[(None, 1, 28, 28)])
    settings.training_settings.steps_per_epoch = 1000

    engine = PyTorchCompressionEngine()
    mcs = ModelCompileSettings(
        optimizer=torch.optim.SGD(params=list(model.parameters()), lr=1e-2),
        training_losses={"NLL_loss": compute_loss_fn},
        training_metrics={"acc": metric_fn},
        evaluation_losses={"NLL_loss": compute_loss_fn},
        evaluation_metrics={"acc": metric_fn}
    )
    final = engine.optimize(
        output_path='outputs',
        settings=settings,
        model=model,
        model_compile_settings=mcs,
        init_training_dataset_fn=get_train_loader,
        init_evaluation_dataset_fn=get_eval_loader,
    )
    engine.deploy(
        clika_state_path=final,
        output_dir_path='outputs',
    )


if __name__ == '__main__':
    main()

Output Log

The log file contains the following sections:

CCO Setup

Information pertaining to CCO initialization and setup, as well as information about the working environment:

Created log at: outputs/logs/clika_optimize_2023-11-08_19:53:30.980863.log
[2023-11-08 19:53:30] Initializing output directory at: 'outputs/logs'
[2023-11-08 19:53:30] CLIKA Version: 0.2.2
[2023-11-08 19:53:30] 'torch' Version: 2.1.0+cu121
[2023-11-08 19:53:30] Python Version: 3.8.16 (default, Mar  2 2023, 03:21:46) [GCC 11.2.0]
...

Training settings for the CCO specified in the cc.TrainingSettings instance, and used in the initialization of the cc.Settings.global_quantization_settings variable:

Training Settings:
    +                              num_epochs = 1
    +                         steps_per_epoch = 1000
    +                        evaluation_steps = -1
...

Global quantization settings for the CCO specified in the cc.QATQuantizationSettings instance, and used in the initialization of the cc.Settings.global_quantization_settings variable:

Global Quantization Settings:
    +      quantization_algorithm = Quantization Aware Training
    +            weights_num_bits = 8
    +        activations_num_bits = 8
...

Deployment settings for the CCO specified in the DeploymentSettings_TensorRT_ONNX instance, and used in the initialization of the cc.Settings.deployment_settings variable:

Deployment Settings:
    + target_framework = TensorRT
    +     Graph Author = CLIKA

Parsed layers named as they will appear in the graph visualization model_init.png, or as will be used when customizing quantization skipping:

[2023-11-10 12:51:38]  1/13. Parsing node: 'x'
[2023-11-10 12:51:38]  2/13. Parsing node: 'conv1'
[2023-11-10 12:51:38]   Parsed to -> 'conv'
[2023-11-10 12:51:38]  3/13. Parsing node: 'relu'
...
[2023-11-10 12:51:39] Saved graph visualization to: outputs/model_init.png

For more information, see Graph Visualization.

Evaluation of the Original, Non-quantized Model

If the is_training_from_scratch parameter of cc.PyTorchCompressionEngine.optimize() is set to False, the initial, non-quantized model will be evaluated. These resultant evaluation metrics (which can include losses or other, custom metrics) will be used to provide a comparison between the original, non-quantized model and the quantized model over the course of CCO:

[2023-11-10 12:51:40] Evaluating the Model
[2023-11-10 12:51:40] [100/313] eta: 0:00:00    eval_loss - 2.2999 (2.3027) | eval_NLL_loss - 2.2999 (2.3027)   iter-time: 0.001s   data-time: 0.001s   sys-mem: 12996MiB   vmem: 28MB
...

eval_loss - the summation of all losses set by the user
eval_NLL_loss - a user-provided loss function provided to the cc.ModelCompileSettings

The number inside the parentheses, e.g., (2.3027), is the running mean loss.

tip

You may change the logging output frequency by setting the parameter cc.TrainingSettings.print_interval

info

One-step log line Breakdown:

This is a breakdown of a single training/evaluation step.

[<data and time>    ] [<steps completed>/<total steps>] eta: 0:00:00    <total evaluation loss> - <total loss value> (<average loss value>) | <loss-name-1>  - <current-loss-2-value> (<average-loss-1>)    <iteration time elapse>: 0.001s  <time it took to fetch data>: 0.001s   <CPU memory>: 12996MiB  <GPU memory>: 28MB

[2023-11-10 12:51:40] [      100       /     313      ] eta: 0:00:00     eval_loss              -  2.2999            (2.3027)               |  eval_NLL_loss -  2.2999                (2.3027)               iter-time:              0.001s   data-time:                   0.001s    sys-mem:     12996MiB   vmem:        28MB

Performance summary of the original, non-quantized model on the evaluation dataset:

Original Model Base Results (Evaluation Dataset):
Evaluation
├── eval_NLL_loss = 2.30182
├── eval_acc = 0.1
└── eval_loss = 2.30182

Preparation of the Quantized Model

Model pre-processing and running statistics for the initial quantization parameters:

[2023-11-10 12:51:40] Preprocessing Model
[2023-11-10 12:51:40] Removing 2 Dropout layers
[2023-11-10 12:51:41] Collecting Statistics from Model
[2023-11-10 12:51:41] Running Statistics and Profiling the Model for at most: 50 steps.
[2023-11-10 12:51:42] Collecting    [50/50] eta: 0:00:00        iter-time: 0.006s   data-time: 0.001s   sys-mem: 12992MiB   vmem: 59MB

Quantization skipping information:

[2023-11-10 12:51:42] Skipped Quantization automatically for the Tail of the model:
    'linear_1'
[2023-11-10 12:51:42] Automatic Quantization: Skipping Quantization for node: 'linear_1'
[2023-11-10 12:51:42] 
# of all Layers      : 10
# of Quantized Layers: 9

In this example, the last layers of the model, i.e., the "model tail" (here, the layer linear_1 only), were skipped and quantization was not performed. This is the default behavior, which can be customized.

For more information, see quantization guide.

Transformation of the layers to their quantized form:

[2023-11-10 12:51:42] 'conv': using 8 bits for Weight Quantization
[2023-11-10 12:51:42] 'conv_1': using 8 bits for Weight Quantization
[2023-11-10 12:51:42] 'linear': using 8 bits for Weight Quantization
[2023-11-10 12:51:42] Dequantizing 'relu_2' using 'relu_2_dequantized'.
[2023-11-10 12:51:43] Saved graph visualization to: outputs/model_post_preprocessing.png

The layer names above (conv,conv_1,linear...) are set in the "parsing" section above and can be observed in the generated file model_post_preprocessing.png in the outputs folder.

For more information, see graph visualization.

Learning Rate Warmup

[2023-11-10 12:51:43] Starting Warmup.
[2023-11-10 12:51:44] Warmup    [100/500]   eta: 0:00:03    loss - 2.2898 (2.2964) | NLL_loss - 2.2898 (2.2964) iter-time: 0.006s   data-time: 0.001s   sys-mem: 13166MiB   vmem: 94MB
[2023-11-10 12:51:45] Warmup    [200/500]   eta: 0:00:02    loss - 2.2742 (2.2704) | NLL_loss - 2.2742 (2.2704) iter-time: 0.006s   data-time: 0.001s   sys-mem: 13166MiB   vmem: 94MB...
[2023-11-10 12:51:46] Warmup Total time: 0:00:03

In this example, we apply a linear learning rate warmup.

Training

[2023-11-10 12:51:47] Starting Epoch 1
[2023-11-10 12:51:47] Epoch 1   [ 100/1000] eta: 0:00:05    loss - 1.8659 (1.8459) | NLL_loss - 1.8659 (1.8459) iter-time: 0.005s   data-time: 0.001s   sys-mem: 13167MiB   vmem: 94MB

Performance on the evaluation dataset after each training epoch:

[2023-11-10 12:51:53] Epoch 1 Total time: 0:00:05
[2023-11-10 12:51:53] Epoch 1 - Evaluating  [100/313]   eta: 0:00:00    eval_loss - 0.9491 (1.0149) | eval_NLL_loss - 0.9491 (1.0149)   iter-time: 0.003s   data-time: 0.001s   sys-mem: 13166MiB   vmem: 94MB...

Summary of epoch 1 for all losses and metrics:

[2023-11-10 12:51:54] Summary of Epoch 1
Evaluation
├── eval_NLL_loss = 1.01522
│   ├── Lowest so far = 1.01522 (Epoch: 1)
│   ├── Highest so far = 1.01522 (Epoch: 1)
│   └── Original value = 2.30182
│       └── Difference = ↘ -1.2866,-55.895%
├── eval_acc = 0.9002
│   ├── Lowest so far = 0.9002 (Epoch: 1)
│   ├── Highest so far = 0.9002 (Epoch: 1)
│   └── Original value = 0.1
│       └── Difference = ↗ +0.8002,+800.195%
└── eval_loss = 1.01522
    ├── Lowest so far = 1.01522 (Epoch: 1)
    ├── Highest so far = 1.01522 (Epoch: 1)
    └── Original value = 2.30182
        └── Difference = ↘ -1.2866,-55.895%
Training
├── NLL_loss = 1.0226
│   ├── Lowest so far = 1.0226 (Epoch: 1)
│   └── Highest so far = 1.0226 (Epoch: 1)
├── loss = 1.0226
│   ├── Lowest so far = 1.0226 (Epoch: 1)
│   └── Highest so far = 1.0226 (Epoch: 1)
└── train_acc = 0.82179
    ├── Lowest so far = 0.82179 (Epoch: 1)
    └── Highest so far = 0.82179 (Epoch: 1)

Lowest so far - The lowest value achieved among all epochs so far.
Highest so far -The highest value achieved among all epochs so far.
Original value - The Loss/Metric value obtained from the original model before compression was initiated.
Difference - The change or difference in performance compared to the original model.

Output of checkpoints and the current state to a .pompom file; this step is the last phase of CCO.

[2023-11-10 12:51:54] Saved checkpoint at: 'outputs/epoch_1/model.pompom'

Deployment

[2023-11-10 12:51:54] Deploying Model from: outputs/epoch_1/model.pompom
[2023-11-10 12:51:55] Saved model at: outputs/BasicMNIST.onnx

Since we used cc.PyTorchCompressionEngine.deploy() in our script, the model is deployed to an .onnx file.

Output Log Breakdown

Compression and Deployment Script​

Output Log​

CCO Setup​

Evaluation of the Original, Non-quantized Model​

Preparation of the Quantized Model​

Learning Rate Warmup​

Training​

Deployment​