Output Log Breakdown
Here we will use the two primary methods of the clika-compression
(cc
) Python package:
- Start CCO - execute compression of a model initialized from an existing
torch.nn.Module
. - Deploy - deploy the compressed model to a chosen framework.
We will analyze and explain the output log generated by these two operations step-by-step, using a toy example.
This is a simple toy example used to showcase the output log. For more complete examples, see our Examples GitHub Repository.
Compression and Deployment Script
import torch
import torchmetrics
import torchvision
from clika_compression import PyTorchCompressionEngine, DeploymentSettings_TensorRT_ONNX
from clika_compression.settings import generate_default_settings, ModelCompileSettings
# https://github.com/pytorch/examples/blob/7f7c222b355abd19ba03a7d4ba90f1092973cdbc/mnist/main.py#L11
class BasicMNIST(torch.nn.Module):
def __init__(self):
super().__init__()
self.conv1 = torch.nn.Conv2d(1, 32, 3, 1)
self.conv2 = torch.nn.Conv2d(32, 64, 3, 1)
self.dropout1 = torch.nn.Dropout(0.25)
self.dropout2 = torch.nn.Dropout(0.5)
self.flatten = torch.nn.Flatten()
self.fc1 = torch.nn.Linear(9216, 128)
self.fc2 = torch.nn.Linear(128, 10)
def forward(self, x):
x = self.conv1(x)
x = torch.nn.functional.relu(x)
x = self.conv2(x)
x = torch.nn.functional.relu(x)
x = torch.nn.functional.max_pool2d(x, kernel_size=(2, 2), stride=None, padding=(0, 0), dilation=(1, 1))
x = self.dropout1(x)
x = self.flatten(x)
x = self.fc1(x)
x = torch.nn.functional.relu(x)
x = self.dropout2(x)
x = self.fc2(x)
return x
def main():
model = BasicMNIST()
compute_loss_fn = lambda a, b: torch.nn.functional.nll_loss(torch.nn.functional.log_softmax(a, dim=1), b)
train_loader = torchvision.datasets.MNIST(root='.', train=True, transform=torchvision.transforms.ToTensor(), download=True)
eval_loader = torchvision.datasets.MNIST(root='.', train=False, transform=torchvision.transforms.ToTensor(), download=True)
get_train_loader = lambda: torch.utils.data.DataLoader(train_loader, batch_size=32)
get_eval_loader = lambda: torch.utils.data.DataLoader(eval_loader, batch_size=32)
metric_fn = torchmetrics.classification.MulticlassAccuracy(num_classes=10)
settings = generate_default_settings()
settings.deployment_settings = DeploymentSettings_TensorRT_ONNX(graph_author="CLIKA",
graph_description=None,
input_shapes_for_deployment=[(None, 1, 28, 28)])
settings.training_settings.steps_per_epoch = 1000
engine = PyTorchCompressionEngine()
mcs = ModelCompileSettings(
optimizer=torch.optim.SGD(params=list(model.parameters()), lr=1e-2),
training_losses={"NLL_loss": compute_loss_fn},
training_metrics={"acc": metric_fn},
evaluation_losses={"NLL_loss": compute_loss_fn},
evaluation_metrics={"acc": metric_fn}
)
final = engine.optimize(
output_path='outputs',
settings=settings,
model=model,
model_compile_settings=mcs,
init_training_dataset_fn=get_train_loader,
init_evaluation_dataset_fn=get_eval_loader,
)
engine.deploy(
clika_state_path=final,
output_dir_path='outputs',
)
if __name__ == '__main__':
main()
Output Log
The log file contains the following sections:
CCO Setup
Information pertaining to CCO initialization and setup, as well as information about the working environment:
Created log at: outputs/logs/clika_optimize_2023-11-08_19:53:30.980863.log
[2023-11-08 19:53:30] Initializing output directory at: 'outputs/logs'
[2023-11-08 19:53:30] CLIKA Version: 0.2.2
[2023-11-08 19:53:30] 'torch' Version: 2.1.0+cu121
[2023-11-08 19:53:30] Python Version: 3.8.16 (default, Mar 2 2023, 03:21:46) [GCC 11.2.0]
...
Training settings for the CCO specified in
the cc.TrainingSettings
instance, and used in the initialization of the
cc.Settings.global_quantization_settings
variable:
Training Settings:
+ num_epochs = 1
+ steps_per_epoch = 1000
+ evaluation_steps = -1
...
Global quantization settings for the CCO specified in the
cc.QATQuantizationSettings
instance, and used in the initialization of the
cc.Settings.global_quantization_settings
variable:
Global Quantization Settings:
+ quantization_algorithm = Quantization Aware Training
+ weights_num_bits = 8
+ activations_num_bits = 8
...
Deployment settings for the CCO specified in the
DeploymentSettings_TensorRT_ONNX
instance, and used in the initialization of the
cc.Settings.deployment_settings
variable:
Deployment Settings:
+ target_framework = TensorRT
+ Graph Author = CLIKA
Parsed layers named as they will appear in the graph visualization model_init.png
,
or as will be used when customizing quantization skipping:
[2023-11-10 12:51:38] 1/13. Parsing node: 'x'
[2023-11-10 12:51:38] 2/13. Parsing node: 'conv1'
[2023-11-10 12:51:38] Parsed to -> 'conv'
[2023-11-10 12:51:38] 3/13. Parsing node: 'relu'
...
[2023-11-10 12:51:39] Saved graph visualization to: outputs/model_init.png
For more information, see Graph Visualization.
Evaluation of the Original, Non-quantized Model
If the is_training_from_scratch
parameter of
cc.PyTorchCompressionEngine.optimize()
is set to False
,
the initial, non-quantized model will be evaluated.
These resultant evaluation metrics (which can include losses or other, custom metrics) will be used
to provide a comparison between the original, non-quantized model and the quantized model over the course of
CCO:
[2023-11-10 12:51:40] Evaluating the Model
[2023-11-10 12:51:40] [100/313] eta: 0:00:00 eval_loss - 2.2999 (2.3027) | eval_NLL_loss - 2.2999 (2.3027) iter-time: 0.001s data-time: 0.001s sys-mem: 12996MiB vmem: 28MB
...
eval_loss
- the summation of all losses set by the usereval_NLL_loss
- a user-provided loss function provided to thecc.ModelCompileSettings
The number inside the parentheses, e.g., (2.3027)
, is the running mean loss.
You may change the logging output frequency by setting the parameter
cc.TrainingSettings.print_interval
One-step log line Breakdown:
This is a breakdown of a single training/evaluation step.
[<data and time> ] [<steps completed>/<total steps>] eta: 0:00:00 <total evaluation loss> - <total loss value> (<average loss value>) | <loss-name-1> - <current-loss-2-value> (<average-loss-1>) <iteration time elapse>: 0.001s <time it took to fetch data>: 0.001s <CPU memory>: 12996MiB <GPU memory>: 28MB
[2023-11-10 12:51:40] [ 100 / 313 ] eta: 0:00:00 eval_loss - 2.2999 (2.3027) | eval_NLL_loss - 2.2999 (2.3027) iter-time: 0.001s data-time: 0.001s sys-mem: 12996MiB vmem: 28MB
Performance summary of the original, non-quantized model on the evaluation dataset:
Original Model Base Results (Evaluation Dataset):
Evaluation
├── eval_NLL_loss = 2.30182
├── eval_acc = 0.1
└── eval_loss = 2.30182
Preparation of the Quantized Model
Model pre-processing and running statistics for the initial quantization parameters:
[2023-11-10 12:51:40] Preprocessing Model
[2023-11-10 12:51:40] Removing 2 Dropout layers
[2023-11-10 12:51:41] Collecting Statistics from Model
[2023-11-10 12:51:41] Running Statistics and Profiling the Model for at most: 50 steps.
[2023-11-10 12:51:42] Collecting [50/50] eta: 0:00:00 iter-time: 0.006s data-time: 0.001s sys-mem: 12992MiB vmem: 59MB
Quantization skipping information:
[2023-11-10 12:51:42] Skipped Quantization automatically for the Tail of the model:
'linear_1'
[2023-11-10 12:51:42] Automatic Quantization: Skipping Quantization for node: 'linear_1'
[2023-11-10 12:51:42]
# of all Layers : 10
# of Quantized Layers: 9
In this example, the last layers of the model, i.e., the "model tail"
(here, the layer linear_1
only),
were skipped and quantization was not performed. This is the default behavior,
which can be customized.
For more information, see quantization guide.
Transformation of the layers to their quantized form:
[2023-11-10 12:51:42] 'conv': using 8 bits for Weight Quantization
[2023-11-10 12:51:42] 'conv_1': using 8 bits for Weight Quantization
[2023-11-10 12:51:42] 'linear': using 8 bits for Weight Quantization
[2023-11-10 12:51:42] Dequantizing 'relu_2' using 'relu_2_dequantized'.
[2023-11-10 12:51:43] Saved graph visualization to: outputs/model_post_preprocessing.png
The layer names above (conv
,conv_1
,linear
...) are set in the "parsing" section above
and can be observed in the generated file model_post_preprocessing.png
in the outputs
folder.
For more information, see graph visualization.
Learning Rate Warmup
[2023-11-10 12:51:43] Starting Warmup.
[2023-11-10 12:51:44] Warmup [100/500] eta: 0:00:03 loss - 2.2898 (2.2964) | NLL_loss - 2.2898 (2.2964) iter-time: 0.006s data-time: 0.001s sys-mem: 13166MiB vmem: 94MB
[2023-11-10 12:51:45] Warmup [200/500] eta: 0:00:02 loss - 2.2742 (2.2704) | NLL_loss - 2.2742 (2.2704) iter-time: 0.006s data-time: 0.001s sys-mem: 13166MiB vmem: 94MB...
[2023-11-10 12:51:46] Warmup Total time: 0:00:03
In this example, we apply a linear learning rate warmup.
Training
[2023-11-10 12:51:47] Starting Epoch 1
[2023-11-10 12:51:47] Epoch 1 [ 100/1000] eta: 0:00:05 loss - 1.8659 (1.8459) | NLL_loss - 1.8659 (1.8459) iter-time: 0.005s data-time: 0.001s sys-mem: 13167MiB vmem: 94MB
Performance on the evaluation dataset after each training epoch:
[2023-11-10 12:51:53] Epoch 1 Total time: 0:00:05
[2023-11-10 12:51:53] Epoch 1 - Evaluating [100/313] eta: 0:00:00 eval_loss - 0.9491 (1.0149) | eval_NLL_loss - 0.9491 (1.0149) iter-time: 0.003s data-time: 0.001s sys-mem: 13166MiB vmem: 94MB...
Summary of epoch 1 for all losses and metrics:
[2023-11-10 12:51:54] Summary of Epoch 1
Evaluation
├── eval_NLL_loss = 1.01522
│ ├── Lowest so far = 1.01522 (Epoch: 1)
│ ├── Highest so far = 1.01522 (Epoch: 1)
│ └── Original value = 2.30182
│ └── Difference = ↘ -1.2866,-55.895%
├── eval_acc = 0.9002
│ ├── Lowest so far = 0.9002 (Epoch: 1)
│ ├── Highest so far = 0.9002 (Epoch: 1)
│ └── Original value = 0.1
│ └── Difference = ↗ +0.8002,+800.195%
└── eval_loss = 1.01522
├── Lowest so far = 1.01522 (Epoch: 1)
├── Highest so far = 1.01522 (Epoch: 1)
└── Original value = 2.30182
└── Difference = ↘ -1.2866,-55.895%
Training
├── NLL_loss = 1.0226
│ ├── Lowest so far = 1.0226 (Epoch: 1)
│ └── Highest so far = 1.0226 (Epoch: 1)
├── loss = 1.0226
│ ├── Lowest so far = 1.0226 (Epoch: 1)
│ └── Highest so far = 1.0226 (Epoch: 1)
└── train_acc = 0.82179
├── Lowest so far = 0.82179 (Epoch: 1)
└── Highest so far = 0.82179 (Epoch: 1)
- Lowest so far - The lowest value achieved among all epochs so far.
- Highest so far -The highest value achieved among all epochs so far.
- Original value - The Loss/Metric value obtained from the original model before compression was initiated.
- Difference - The change or difference in performance compared to the original model.
Output of checkpoints and the current state to a .pompom
file; this step is the last phase of CCO.
[2023-11-10 12:51:54] Saved checkpoint at: 'outputs/epoch_1/model.pompom'
Deployment
[2023-11-10 12:51:54] Deploying Model from: outputs/epoch_1/model.pompom
[2023-11-10 12:51:55] Saved model at: outputs/BasicMNIST.onnx
Since we used cc.PyTorchCompressionEngine.deploy()
in our script, the model is deployed to an .onnx
file.