Skip to main content
Version: 0.3

How to use CLIKA Compression


The clika-compression Python package implements CLIKA's unique compression engine called

CLIKA Compression Operation (CCO).

  • CCO is the process in which the model is compressed with CLIKA's compression engine.
  • CCO is "Hardware-aware", and compress the model optimally for the selected framework.
  • CCO can be applied to fine-tune a pretrained model or to train a model from scratch.

The clika-compression has three main usages:


.pompom files are the outputs of the CCO that contains the compressed model's checkpoints state.

For more information, see model.pompom.

Start a CLIKA Compression Operation

You can start a CCO from any torch.nn.Module by calling clika_compress

In order to start the CCO you MUST provide the following items:

There are several optional inputs that you MAY provide as well:

  • Function that returns an evaluation dataset dataloader (Callable ->, in case you require evaluation performance to be calculated during the compression process
  • Training metrics (if you require metrics on the training dataset during CCO)
  • Evaluation metrics (if you require metrics on the evaluation dataset during CCO)

The provided loss functions, metrics and dataloaders must conform to certain requirements.

See CCO input requirements for more details.


import torch
from clika_compression import Settings, clika_compress
from import DataLoader

model: torch.nn.Module = ... # Set your model

# Function that returns the training dataloader
def get_train_loader() -> DataLoader:
# TODO: Create your dataset and dataloader here
return train_dataloader

optimizer = torch.optim.AdamW(params=model.parameters(), lr=0.0001)
settings = Settings() # Default settings

# Start the "CCO" to compress model to generate a `.pompom` file
# Returns the path to the latest epoch's `.pompom` file
final: str = clika_compress(
output_path='outputs', # Path to save the compressed models and other outputs
settings=settings, # CCO settings
model=model, # Model to compress
training_losses={'cross_entropy_loss': torch.nn.CrossEntropyLoss()},

Resume a CLIKA Compression Operation

If a CCO session was previously executed, and you wish to resume from the point where you left off, you can utilize the clika_resume function along with the relevant .pompom file.

This feature is beneficial in the following scenarios:

  1. Interruption of the training process :

    • In the event of a mid-run crash or interruption during a CCO, you can resume the operation from the last checkpoint.
  2. Introduction of new data:

    • When new data is introduced to the training process, it will allow you to resume the CCO with a new dataset, ensuring continuity in the compression process.
  3. Additional fine-tuning:

    • If you wish to further fine-tune the model by running more epochs, you may continue the CCO from the previous checkpoint, enabling you to run additional epochs without starting from scratch.


import torch
from clika_compression import Settings, clika_resume
from import DataLoader

settings = Settings() # Default settings
optimizer = torch.optim.AdamW(params=model.parameters(), lr=0.0001)

# Function that returns the training dataloader
def get_train_loader() -> DataLoader:
# TODO: Create your dataset and dataloader here
return train_dataloader

# Resume the "CCO" compression process
# Returns the path to the latest epoch's `.pompom` file
final: str = clika_resume(
training_losses={'ce_loss': torch.nn.CrossEntropyLoss()},
new_settings=settings, # the new CCO settings

Deploy a model


Do not deploy a .onnx file that was generated by CCO with a certain deployment setting (for example, ONNX Runtime) to another framework (for example, TensorRT). Each onnx file is tailored to the deployment framework set in Settings.deployment_settings during the CCO and should only be deployed to that framework.

Once the CCO has been executed, you have the option to deploy the resulting .pompom file to a framework of your choice by utilizing the clika_deploy method. This method enables a straightforward deployment for the compressed model obtained from the CCO.

The current clika-compression version supports TensorRT, ONNX Runtime, OpenVINO, TFlite deployment.

For more information, see DeploymentSettings.

CLIKA deployment example

from clika_compression import DeploymentSettings_TensorRT_ONNX, Settings, clika_compress, clika_deploy

output_dir = "outputs"
settings = Settings() # Default settings
settings.deployment_settings = DeploymentSettings_TensorRT_ONNX() # select deployment framework

clika_state_path = clika_compress( # type: str
# ... other args

# Generate `.onnx` file and saves it to 'output' folder
deployed_file_path = clika_deploy( # type: str
input_shapes=[(None, 1, 28, 28)],

TensorRT deployment

After running CCO and clika_deploy a <model_name>.onnx will be generated. This file should be used in conjunction with the trtexec command as shown below to create a .engine file, which can be deployed to TensorRT.

To deploy your model to a .engine file, install TensorRT on your local machine OR use a docker container by NVIDIA.

Here, we use a docker container by NVIDIA to deploy our model by running the following commands (the docker image will be automatically pulled):

For example, you may run the following command that sets the following parameters:

  • TensorRT version - 23.07
  • Path to CLIKA deployed.onnx file - outputs/MyModel.onnx
  • Path to save TensorRT complied model - MyModel.engine
  • Input shape - 1x3x640x640
docker run --gpus all --rm -v .:/workspace/ -c trtexec --onnx=outputs/MyModel.onnx --saveEngine=MyModel.engine  --shapes=input_0:1x3x640x640  --int8 --workspace=1024 

For more information, see CLIKA examples and TensorRT documentation.

Multi-GPU distributed compression

To use multi-GPU distributed compression with CCO, you must create a DistributedTrainingSettings object and set it as a Settings.distributed_training_settings attribute, before passing the Settings object to clika_compress or clika_resume function call.

In DistributedTrainingSettings:

  • You can use set argument multi_gpu to True to use distributed compression.
  • if use_sharding is True the model will be sharded across the available GPU similar to PyTorch FSDP and DeepSpeed ZeRO-3, which reduce GPU memory usage but may increase latency.
  • If use_sharding is False a technique similar to PyTorch's DistributedDataParallel paradigm (or DDP) will be used, in which the model will be copied to each GPU and the dataset split between them. This will reduce latency but will not reduce GPU memory usage.

Usage example with multi-GPU and sharding training strategy:

from clika_compression import DistributedTrainingSettings, Settings, clika_compress

settings = Settings() # Default settings
settings.distributed_training_settings = DistributedTrainingSettings(multi_gpu=True,use_sharding=True)


# ... other args

For more information, see multi-GPU input restrictions.

Configuration file

You may use a YAML file to save and load CCO configuration.

For more information, see Configuration file.

Monitoring integration

If you wish to monitor your training using TensorBoard may pass the TensorBoardCallback instance to the clika_compress or clika_resume functions as show here:

from clika_compression.utils.callbacks import TensorBoardCallback
from clika_compression import clika_compress

# ... other args

CCO outputs

The output directory tree structure will be as follows:

├── epoch_1/
│ ├── model.pompom
│ └── summary.json
├── epoch_2/
│ ├── model.pompom
│ └── summary.json
├── logs/
│ └──clika_optimize_<timestamp>.log
└── <model-name>.onnx/tflite # only generated when deploying the model

During CCO, for each epoch (unless it was stated otherwise in the TrainingSettings.save_interval), a new folder will be created named epoch_<current-epoch-index>/.

Inside the folder, the following two files will be generated:


The outputs of CCO are stored in .pompom files, they contain the compressed model and have everything that is required to resume CCO or deploy the compressed model. These .pompom files have two main usages:

  • as input to the clika_resume method to resume CCO.
  • as input to the clika_deploy method to create a deployable model for a specific framework.


We will break down the summary.json structure by using an example. The script:

import torch
from clika_compression import Settings, clika_compress
from torchmetrics.classification.accuracy import MulticlassAccuracy

optimizer = torch.optim.AdamW(params=model.parameters(), lr=0.0001)
settings = Settings() # Default settings

# Start the "CCO" to compress model and generate `.pompom` files
final = clika_compress( # type: str
init_training_dataset_fn=get_train_loader, # type: Callable
init_evaluation_dataset_fn=get_evaluation_loader, # type: Callable
training_losses={"ce_loss": torch.nn.CrossEntropyLoss()},
training_metrics={'multiclass_accuracy': MulticlassAccuracy(num_classes=10)},
"ce_loss(sum)": torch.nn.CrossEntropyLoss(reduction="sum"),
"ce_loss(mean)": torch.nn.CrossEntropyLoss(reduction="mean")
evaluation_metrics={'multiclass_accuracy': MulticlassAccuracy(num_classes=10)},


will generate the following summary.json file:

"epoch": 1,
"training": {
// `training_loss` names and corresponding values
"ce_loss": 9.7,

// total loss - a summation of all losses
// since we have just provided one loss function it is identical to "ce_loss"
"loss": 9.7,

// `training_metrics` supplied by users
"multiclass_accuracy": 0.0,
"evaluation": {
// `evaluation_losses` names and corresponding values
"ce_loss(sum)": 87.0,
"ce_loss(mean)": 8.7,

// `evaluation_metrics` supplied by users
"eval_multiclass_accuracy": 0.9,
"time_elapsed": "0:00:00"


During every run of CCO (start, resume or deploy), logs printed to the terminal are saved to a file in the outputs folder. The logs contain information about the CCO run and can be helpful for monitoring the process and for debugging purposes. See Output log breakdown for more details.

Next steps

That's it! Both the CCO and the deployment are flexible and highly configurable to best fit your needs.