Version: 0.2

How To Use CLIKA Compression

Tutorial Video

info

The code in this video can be found on CLIKA Compression examples repository via this link..

Background

The clika-compression Python package implements CLIKA's unique compression engine called

CLIKA Compression Operation (CCO).

CCO is the process in which the model is compressed with CLIKA's compression engine.
CCO is "Hardware-aware", and compress the model optimally for the selected framework.
CCO can be applied to fine-tune a pretrained model or to train a model from scratch.

The clika-compression has three main usages:

Start CCO that is initialized from an existing torch.nn.Module.
Resume CCO that is initialized from an existing compressed model (.pompom file).
Deploy the compressed model (.pompom file) to a chosen framework.

info

Note that we will use cc (clika-compression) as the name of the package, as if we used:

import clika_compression as cc

info

.pompom files are the outputs of the CCO that contains the compressed model's checkpoints state.

For more information, see model.pompom.

Start a CLIKA Compression Operation

You can start a CCO from any torch.nn.Module by using cc.PyTorchCompressionEngine.optimize()

In order to start the CCO you MUST provide the following items:

Model to compress (torch.nn.Module)
Optimizer (torch.optim.Optimizer)
Loss function (Callable -> tuple|dict)
Function that returns the training dataset dataloader (Callable ->torch.utils.data.DataLoader

There are several optional inputs that you MAY provide as well:

Function that returns an evaluation dataset dataloader (in case you require evaluation performance to be calculated during the compression process)
Training metrics (if you require metrics on the training dataset during CCO)
Evaluation metrics (if you require metrics on the evaluation dataset during CCO)

caution

The provided loss functions, metrics and dataloaders must conform to certain requirements.

See CCO Input Requirements for more details.

Example

import torch
from clika_compression import PyTorchCompressionEngine
from clika_compression.settings import generate_default_settings, ModelCompileSettings
from torch.utils.data import DataLoader

...
model: torch.nn.Module = my_model  # Set your model


def get_train_loader() -> DataLoader:
    # Create your dataset and dataloader here
    return train_dataloader


optimizer = torch.optim.AdamW(params=list(model.parameters()), lr=0.0001)
settings = generate_default_settings()  # Can be configured after creation

engine = PyTorchCompressionEngine()

mcs = ModelCompileSettings(
    optimizer=optimizer,
    training_losses={'cross_entropy_loss': torch.nn.CrossEntropyLoss()},
)

# Start the "CCO" to compress model to generate a `.pompom` file
engine.optimize(  # final is the path to the `.pompom` file from the latest epoch
    output_path='outputs',  # the path to save the compressed models and other outputs
    settings=settings,  # the CCO settings
    model=model,  # the model to compress
    model_compile_settings=mcs,
    init_training_dataset_fn=get_train_loader,  # a function that returns the training dataloader
    multi_gpu=True  # use Multi-GPU Distributed Compression
)

Resume a CLIKA Compression Operation

If a CCO session was previously executed and you wish to resume from the point where you left off, you can utilize the cc.PyTorchCompressionEngine.resume() function along with the relevant .pompom file.

This feature is beneficial in the following scenarios:

Interruption of the training process :
- In the event of a mid-run crash or interruption during a CCO, you can resume the operation from the last checkpoint.
Introduction of new data:
- When new data is introduced to the training process, it will allow you to resume the CCO with a new dataset, ensuring continuity in the compression process.
Additional fine-tuning:
- If you wish to further fine-tune the model by running more epochs, you may continue the CCO from the previous checkpoint, enabling you to run additional epochs without starting from scratch.

Example

import torch
from clika_compression import PyTorchCompressionEngine
from clika_compression.settings import generate_default_settings, ModelCompileSettings
from torch.utils.data import DataLoader

...

settings = generate_default_settings()
optimizer = torch.optim.AdamW(params=list(model.parameters()), lr=0.0001)
mcs = ModelCompileSettings(
    optimizer=optimizer,
    training_losses={'ce_loss': torch.nn.CrossEntropyLoss()},
)
engine = PyTorchCompressionEngine()


def get_train_loader() -> DataLoader:
    # Create your dataset and dataloader here
    return train_dataloader


# Resume the "CCO" to compress model to generate a `.pompom` file
engine.resume(  # final is the path to the `.pompom` file from the latest epoch
    clika_state_path='path/to/.pompom',
    model_compile_settings=mcs,
    init_training_dataset_fn=get_train_loader,  # a function that returns the training dataloader
    settings=settings,  # the CCO settings
    multi_gpu=True  # use Multi-GPU Distributed Compression
)

Deploy a Model

caution

Do not deploy a .onnx file that was generated by CCO with a certain deployment setting (for example, ONNX Runtime) to another framework (for example, TensorRT). Each onnx file created is tailored to the deployment framework set in Settings.deployment_settings and should only be deployed to that framework.

Once the CCO has been executed, you have the option to deploy the resulting .pompom file to a framework of your choice by utilizing the cc.PyTorchCompressionEngine.deploy() method. This method enables a straightforward deployment for the compressed model obtained from the CCO.

The current clika-compression version supports TensorRT and ONNX Runtime deployment (see DeploymentSettings).

CLIKA Deployment Example

from clika_compression import PyTorchCompressionEngine, DeploymentSettings_TensorRT_ONNX
from clika_compression.settings import generate_default_settings

engine = PyTorchCompressionEngine()
settings = generate_default_settings()
settings.deployment_settings = DeploymentSettings_TensorRT_ONNX(graph_author="CLIKA",
                                                                input_shapes_for_deployment=[(None, 1, 28, 28)]),
# Generate `.onnx` file and saves it to 'output'
engine.deploy(clika_state_path='path/to/.pompom', output_dir_path='output')

TensorRT Deployment

After running CCO and cc.PyTorchCompressionEngine.deploy() a MyModel.onnx will be generated. This file should be fed to the trtexec command as shown below to create a .engine file deployed to TensorRT.

To deploy your model to a .engine file, install TensorTR on your local machine OR use a docker container by Nvidia.

Here, we use a docker container by Nvidia. to deploy our model by running the following command (the docker image will be automatically pulled from Docker Hub):

You may also provide minimum or maximum shape bounds instead of specifying the exact shapes:

docker run --gpus all --rm -v .:/workspace/ nvcr.io/nvidia/tensorrt:23.07-py3 -c trtexec \
        --onnx=outputs/MyModel.onnx \
        --saveEngine=outputs/MyModel.engine \
        --shapes=input_0:1x3x224x224 \
        --int8 --workspace=1024 

# --onnx  - relative path to your model `.onnx` file 
# --saveEngine - relative path to deployed model 
# --shapes -  input shape (make sure it compatible with the shapes selected during CLIKA deployment)

You may also provide minimum or maximum shape bounds instead of specifying the exact shapes:

docker run --gpus all --rm -v .:/workspace/ nvcr.io/nvidia/tensorrt:23.07-py3 -c trtexec \
        --onnx=outputs/MyModel.onnx \ 
        --saveEngine=MyModel.engine \ 
        --minShapes=input_0:1x3x224x224 \ 
        --optShapes=input_0:1x3x640x640 \ 
        --maxShapes=input_0:1x3x1080x1080 \ 
        --int8 --workspace=1024 
        
# --onnx - relative path to your model `.onnx` file
# --saveEngine - relative path to deployed model

# make sure the non-constant dimentions are `None` when using `PyTorchCompressionEngine.deploy`, 
# in this example: `PyTorchCompressionEngine.deploy.input_shapes=[1,3,None,None]` 

For more information, see CLIKA examples and TensorRT documentation.

Multi-GPU Distributed Compression

To use "Multi-GPU Distributed Compression" with CCO, the argument multi_gpu must be set to True in the cc.PyTorchCompressionEngine.optimize() function call (used for start CCO) or the cc.PyTorchCompressionEngine.resume() function call (used for resume CCO).

The clika-compression "Multi-GPU Distributed Compression" technique is similar to PyTorch's DistributedDataParallel paradigm, which copies the model to each GPU and splits the dataset between them.

For more information, see Multi-GPU input restrictions.

CCO Outputs

The output directory tree structure will be as follows:

<output-dir>/ 
├── epoch_1/
│ ├── model.pompom
│ └── summary.json
├── epoch_2/
│ ├── model.pompom
│ └── summary.json
...
├── logs/ 
│ └──clika_optimize_<timestamp>.log
...  
└── <model-name>.onnx # only generated when using `cc.PyTorchCompressionEngine.deploy()`

During CCO, each epoch (unless it was stated otherwise in the cc.TrainingSettings.save_interval), a new folder will be created named epoch_<current-epoch-index>/.

Inside the folder, the following two files will be generated:

model.pompom
summary.json

model.pompom

The outputs of CCO are stored in.pompom files; they contain the compressed model and have everything that is required to resume CCO or deploy the compressed model.

They have two main usages:

as input to the cc.PyTorchCompressionEngine.resume() method to resume CCO.
as input to the cc.PyTorchCompressionEngine.deploy() method to create a deployable model for a specific framework.

summary.json

We will break down the summary.json structure by using an example.

The script:

import torch
from clika_compression import PyTorchCompressionEngine
from clika_compression.settings import generate_default_settings, ModelCompileSettings
from torchmetrics.classification.accuracy import MulticlassAccuracy

...
optimizer = torch.optim.AdamW(params=list(model.parameters()), lr=0.0001)
mcs = ModelCompileSettings(
    optimizer=optimizer,
    training_losses={"ce_loss": torch.nn.CrossEntropyLoss()},
    training_metrics={'multiclass_accuracy': MulticlassAccuracy()},
    evaluation_losses={"ce_loss": torch.nn.CrossEntropyLoss()},
    evaluation_metrics={'multiclass_accuracy': MulticlassAccuracy()},
)

engine = PyTorchCompressionEngine()
settings = generate_default_settings()  # Can be configured after creation

# Start the "CCO" to compress model to generate a `.pompom` file
engine.optimize(  # final is the path to the `.pompom` file from the latest epoch
    output_path='outputs',
    settings=settings,  # the CCO settings
    model=model,  # the model to compress
    model_compile_settings=mcs,
    init_training_dataset_fn=get_train_loader,  # a function that returns the training dataloader
    init_evaluation_dataset_fn=get_evaluation_loader,  # a function that returns the evaluation dataloader
)

Will generate the following summary.json file:

{
  "epoch": 1,# index of epoch
  "training": {
    "ce_loss": 9.7, # the cross-entropy loss
    "loss": 9.7, # the total loss - a summation of all losses (in this case, since we have just one loss it's identical to cross-entropy loss)
    "multiclass_accuracy": 0.0, # the metric supplied by the user
  },
  "evaluation": { # same as in "training" just for the evaluation dataset
    "eval_ce_loss": 8.7,
    "eval_loss": 8.7,
    "eval_multiclass_accuracy": 0.9,
  },
  "time_elapsed": "0:00:00" # the duration of the CCO
}

Logs

During every run of CCO (start, resume or deploy), logs printed to the terminal are saved to a file in the outputs folder. The logs contain information about the CCO run and can be helpful for monitoring the process and for debugging purposes.

See Output Log Breakdown for more details.

Next Steps

And that's it! Both the CCO and the deployment are flexible and highly configurable to best fit your needs.

For further details, please refer to the Python API documentation.
For more examples and use cases, see our Examples GitHub Repository.

How To Use CLIKA Compression

Tutorial Video​

Background​

Start a CLIKA Compression Operation​

Example​

Resume a CLIKA Compression Operation​

Example​

Deploy a Model​

CLIKA Deployment Example​

TensorRT Deployment​

Multi-GPU Distributed Compression​

CCO Outputs​

model.pompom​

summary.json​

Logs​

Next Steps​

Tutorial Video

Background

Start a CLIKA Compression Operation

Example

Resume a CLIKA Compression Operation

Example

Deploy a Model

CLIKA Deployment Example

TensorRT Deployment

Multi-GPU Distributed Compression

CCO Outputs

model.pompom

summary.json

Logs

Next Steps