How to use CLIKA Compression
Background
The clika-compression
Python package implements CLIKA's unique compression engine called
CLIKA Compression Operation (CCO).
- CCO is the process in which the model is compressed with CLIKA's compression engine.
- CCO is "Hardware-aware", and compress the model optimally for the selected framework.
- CCO can be applied to fine-tune a pretrained model or to train a model from scratch.
The clika-compression
has three main usages:
- Start CCO that is initialized from an existing
torch.nn.Module
. - Resume CCO that is initialized from an existing compressed model (
.pompom
file). - Deploy the compressed model (
.pompom
file) to a chosen framework.
.pompom
files are the outputs of the CCO that contains the compressed model's
checkpoints state.
For more information, see model.pompom.
Start a CLIKA Compression Operation
You can start a CCO from any torch.nn.Module
by calling clika_compress
In order to start the CCO you MUST provide the following items:
- Model to compress (
torch.nn.Module
) - Optimizer (
torch.optim.Optimizer
) - Loss function (
Callable -> Union[tuple,dict]
) - Function that returns the training dataloader (
Callable -> torch.utils.data.DataLoader
) - Target deployment framework, one of TensorRT, ONNX Runtime, OpenVINO, TFlite (Experimental).
There are several optional inputs that you MAY provide as well:
- Function that returns an evaluation dataset dataloader (
Callable -> torch.utils.data.DataLoader
), in case you require evaluation performance to be calculated during the compression process - Training metrics (if you require metrics on the training dataset during CCO)
- Evaluation metrics (if you require metrics on the evaluation dataset during CCO)
The provided loss functions, metrics and dataloaders must conform to certain requirements.
See CCO input requirements for more details.
Example
import torch
from clika_compression import Settings, clika_compress, get_path_to_best_clika_state_result
from torch.utils.data import DataLoader
model: torch.nn.Module = ... # Set your model
# Function that returns the training dataloader
def get_train_loader() -> DataLoader:
# TODO: Create your dataset and dataloader here
return train_dataloader
optimizer = torch.optim.AdamW(params=model.parameters(), lr=0.0001)
settings = Settings() # Default settings
# Start the "CCO" to compress model to generate a `.pompom` file
clika_compress(
output_path='outputs', # Path to save the compressed models and other outputs
settings=settings, # CCO settings
model=model, # Model to compress
optimizer=optimizer,
init_training_dataset_fn=get_train_loader,
init_evaluation_dataset_fn=None,
training_losses={'cross_entropy_loss': torch.nn.CrossEntropyLoss()},
)
best_clika_state_file = get_path_to_best_clika_state_result('outputs')
Resume a CLIKA Compression Operation
If a CCO session was previously executed, and you wish to resume from
the point where you left off, you can utilize the clika_resume
function along with the relevant .pompom
file.
This feature is beneficial in the following scenarios:
Interruption of the training process :
- In the event of a mid-run crash or interruption during a CCO, you can resume the operation from the last checkpoint.
Introduction of new data:
- When new data is introduced to the training process, it will allow you to resume the CCO with a new dataset, ensuring continuity in the compression process.
Additional fine-tuning:
- If you wish to further fine-tune the model by running more epochs, you may continue the CCO from the previous checkpoint, enabling you to run additional epochs without starting from scratch.
Example
import torch
from clika_compression import Settings, clika_resume
from torch.utils.data import DataLoader
settings = Settings() # Default settings
optimizer = torch.optim.AdamW(params=model.parameters(), lr=0.0001)
# Function that returns the training dataloader
def get_train_loader() -> DataLoader:
# TODO: Create your dataset and dataloader here
return train_dataloader
# Resume the "CCO" compression process
# Returns the path to the latest epoch's `.pompom` file
final: str = clika_resume(
clika_state_path='path/to/.pompom',
init_training_dataset_fn=get_train_loader,
optimizer=optimizer,
training_losses={'ce_loss': torch.nn.CrossEntropyLoss()},
new_settings=settings, # the new CCO settings
)
Deploy a model
Do not deploy a .onnx
file that was generated by CCO with a certain deployment setting
(for example, ONNX Runtime
) to another framework
(for example, TensorRT
).
Each .pompom
file is tailored to the deployment framework set in Settings.deployment_settings
during the CCO
and should only be deployed and used in that framework.
Once the CCO has been executed, you can deploy the resulting
.pompom
file to the target framework it was compressed for by one of:
- Python API -
clika_deploy
- CLI Tool -
clika-deploy
These two methods enables a straightforward deployment for the compressed model .pompom
file obtained from the CCO to a .onnx
/.tflite
file.
For more information, see DeploymentSettings.
CLIKA deployment examples
CLI Tool
# for a model with one input and a dynamic first dimention (batch)
clika-deploy <path_to_onnx> --shape "?x3x224x224" # use "?" to represent a dynamic dimension
# for a model that takes two inputs with the same shape [1,512]
clika-deploy <path_to_onnx> --shape "1x512 1x512" # use whitespace (" ") to seperate multiple inputs
For more information use run clika-deploy -h
To observe the model inputs shapes and order easily you may use clika-deploy
without the --shape
arguments
which will give you the .onnx
file with the shapes used during training, then you can examine the inputs
shapes and order on the deployed onnx file using
VEGA by CLIKA or any other tool, then rerun deployment with the required shapes
Python API
from clika_compression import DeploymentSettings_TensorRT_ONNX, Settings, clika_compress, clika_deploy
output_dir = "outputs"
settings = Settings() # Default settings
settings.deployment_settings = DeploymentSettings_TensorRT_ONNX() # select deployment framework
clika_state_path = clika_compress( # type: str
output_path=output_dir,
settings=settings,
# ... other args
)
# Generate `.onnx` file and saves it to 'output' folder
deployed_file_path = clika_deploy( # type: str
clika_state_path=clika_state_path,
output_dir_path=output_dir,
input_shapes=[(None, 1, 28, 28)],
)
TensorRT deployment
After running CCO and clika_deploy
a <model_name>.onnx
will be generated. This file should be used in conjunction with the trtexec
command as shown below to create a .engine
file, which can be deployed to TensorRT.
To deploy your model to a .engine
file, install TensorRT
on your local machine OR use a docker container by NVIDIA.
Here, we use a docker container by NVIDIA to deploy our model by running the following commands (the docker image will be automatically pulled):
For example, you may run the following command that sets the following parameters:
- TensorRT container version -
23.07
- Path to CLIKA deployed
.onnx
file -outputs/MyModel.onnx
- Path to save TensorRT complied model -
MyModel.engine
- Input shape -
1x3x640x640
docker run --gpus all --rm -v .:/workspace/ nvcr.io/nvidia/tensorrt:23.07-py3 -c trtexec --onnx=outputs/MyModel.onnx --saveEngine=MyModel.engine --shapes=input_0:1x3x640x640 --int8 --workspace=1024
For more information, see CLIKA examples and TensorRT documentation.
Multi-GPU distributed compression
To use multi-GPU distributed compression with CCO, you must create a DistributedTrainingSettings
object
and set it as a Settings.distributed_training_settings
attribute,
before passing the Settings
object to clika_compress
or clika_resume
function call.
In DistributedTrainingSettings
:
- You can use set argument
multi_gpu
toTrue
to use distributed compression. - if
use_sharding
isTrue
the model will be sharded across the available GPU similar to PyTorch FSDP and DeepSpeed ZeRO-3, which reduce GPU memory usage but may increase latency. - If
use_sharding
isFalse
a technique similar to PyTorch's DistributedDataParallel paradigm (or DDP) will be used, in which the model will be copied to each GPU and the dataset split between them. This will reduce latency but will not reduce GPU memory usage.
Usage example with multi-GPU and sharding training strategy:
from clika_compression import DistributedTrainingSettings, Settings, clika_compress
settings = Settings() # Default settings
settings.distributed_training_settings = DistributedTrainingSettings(multi_gpu=True,use_sharding=True)
...
clika_compress(
output_path="outputs",
settings=settings
# ... other args
)
For more information, see multi-GPU input restrictions.
Configuration file
You may use a YAML file to save and load CCO configuration.
For more information, see Configuration file.
Monitoring integration
If you wish to monitor your training using TensorBoard
may pass the TensorBoardCallback
instance to the
clika_compress
or clika_resume
functions as show here:
from clika_compression.utils.callbacks import TensorBoardCallback
from clika_compression import clika_compress
clika_compress(
# ... other args
callbacks=[TensorBoardCallback(output_path='tensorboard_runs')]
)
CCO outputs
The output directory tree structure will be as follows:
<output-dir>/
├── epoch_1/
│ ├── model.pompom
│ └── summary.json
├── epoch_2/
│ ├── model.pompom
│ └── summary.json
...
├── logs/
│ └──clika_optimize_<timestamp>.log
...
└── <model-name>.onnx/tflite # only generated when deploying the model
During CCO, for each epoch (unless it was stated otherwise in the
TrainingSettings.save_interval
),
a new folder will be created named epoch_<current-epoch-index>/
.
Inside the folder, the following two files will be generated:
model.pompom
The outputs of CCO are stored in .pompom
files,
they contain the compressed model and have
everything that is required to resume CCO or deploy the compressed model.
These .pompom
files have two main usages:
- as input to the
clika_resume
method to resume CCO. - as input to the
clika_deploy
method to create a deployable model for a specific framework.
summary.json
We will break down the summary.json
structure by using an example.
The script:
import torch
from clika_compression import Settings, clika_compress, get_path_to_best_clika_state_result
from torchmetrics.classification.accuracy import MulticlassAccuracy
...
optimizer = torch.optim.AdamW(params=model.parameters(), lr=0.0001)
settings = Settings() # Default settings
# Start the "CCO" to compress model and generate `.pompom` files
clika_compress( # type: str
output_path='outputs',
settings=settings,
model=model,
init_training_dataset_fn=get_train_loader, # type: Callable
init_evaluation_dataset_fn=get_evaluation_loader, # type: Callable
optimizer=optimizer,
training_losses={"ce_loss": torch.nn.CrossEntropyLoss()},
training_metrics={'multiclass_accuracy': MulticlassAccuracy(num_classes=10)},
evaluation_losses={
"ce_loss(sum)": torch.nn.CrossEntropyLoss(reduction="sum"),
"ce_loss(mean)": torch.nn.CrossEntropyLoss(reduction="mean")
},
evaluation_metrics={'multiclass_accuracy': MulticlassAccuracy(num_classes=10)},
)
best_clika_state_file = get_path_to_best_clika_state_result('outputs')
will generate the following summary.json
file:
{
"epoch": 1,
"training": {
// `training_loss` names and corresponding values
"ce_loss": 9.7,
// total loss - a summation of all losses
// since we have just provided one loss function it is identical to "ce_loss"
"loss": 9.7,
// `training_metrics` supplied by users
"multiclass_accuracy": 0.0,
},
"evaluation": {
// `evaluation_losses` names and corresponding values
"ce_loss(sum)": 87.0,
"ce_loss(mean)": 8.7,
// `evaluation_metrics` supplied by users
"eval_multiclass_accuracy": 0.9,
},
"time_elapsed": "0:00:00"
}
Logs
During every run of CCO (start, resume or deploy),
logs printed to the terminal are saved to
a file in the outputs
folder.
The logs contain information about the CCO
run and can be helpful for monitoring the process and for debugging purposes.
See Output log breakdown for more details.
Next steps
That's it! Both the CCO and the deployment are flexible and highly configurable to best fit your needs.
- For further details, please refer to the Python API documentation.
- For more examples and use cases, see our Examples GitHub Repository.