How To Use CLIKA Compression
Tutorial Video
The code in this video can be found on CLIKA Compression examples repository via this link..
Background
The clika-compression
Python package implements CLIKA's unique compression engine called
CLIKA Compression Operation (CCO).
- CCO is the process in which the model is compressed with CLIKA's compression engine.
- CCO is "Hardware-aware", and compress the model optimally for the selected framework.
- CCO can be applied to fine-tune a pretrained model or to train a model from scratch.
The clika-compression
has three main usages:
- Start CCO that is initialized from an existing
torch.nn.Module
. - Resume CCO that is initialized from an existing compressed model (
.pompom
file). - Deploy the compressed model (
.pompom
file) to a chosen framework.
Note that we will use cc
(clika-compression
) as the name of the package, as if we used:
import clika_compression as cc
.pompom
files are the outputs of the CCO that contains the compressed model's
checkpoints state.
For more information, see model.pompom.
Start a CLIKA Compression Operation
You can start a CCO from any torch.nn.Module
by using cc.PyTorchCompressionEngine.optimize()
In order to start the CCO you MUST provide the following items:
- Model to compress (
torch.nn.Module
) - Optimizer (
torch.optim.Optimizer
) - Loss function (
Callable
->
tuple
|dict
) - Function that returns the training dataset dataloader (
Callable
->torch.utils.data.DataLoader
There are several optional inputs that you MAY provide as well:
- Function that returns an evaluation dataset dataloader (in case you require evaluation performance to be calculated during the compression process)
- Training metrics (if you require metrics on the training dataset during CCO)
- Evaluation metrics (if you require metrics on the evaluation dataset during CCO)
The provided loss functions, metrics and dataloaders must conform to certain requirements.
See CCO Input Requirements for more details.
Example
import torch
from clika_compression import PyTorchCompressionEngine
from clika_compression.settings import generate_default_settings, ModelCompileSettings
from torch.utils.data import DataLoader
...
model: torch.nn.Module = my_model # Set your model
def get_train_loader() -> DataLoader:
# Create your dataset and dataloader here
return train_dataloader
optimizer = torch.optim.AdamW(params=list(model.parameters()), lr=0.0001)
settings = generate_default_settings() # Can be configured after creation
engine = PyTorchCompressionEngine()
mcs = ModelCompileSettings(
optimizer=optimizer,
training_losses={'cross_entropy_loss': torch.nn.CrossEntropyLoss()},
)
# Start the "CCO" to compress model to generate a `.pompom` file
engine.optimize( # final is the path to the `.pompom` file from the latest epoch
output_path='outputs', # the path to save the compressed models and other outputs
settings=settings, # the CCO settings
model=model, # the model to compress
model_compile_settings=mcs,
init_training_dataset_fn=get_train_loader, # a function that returns the training dataloader
multi_gpu=True # use Multi-GPU Distributed Compression
)
Resume a CLIKA Compression Operation
If a CCO session was previously executed and you wish to resume from
the point where you left off, you can utilize the cc.PyTorchCompressionEngine.resume()
function along with the relevant .pompom
file.
This feature is beneficial in the following scenarios:
Interruption of the training process :
- In the event of a mid-run crash or interruption during a CCO, you can resume the operation from the last checkpoint.
Introduction of new data:
- When new data is introduced to the training process, it will allow you to resume the CCO with a new dataset, ensuring continuity in the compression process.
Additional fine-tuning:
- If you wish to further fine-tune the model by running more epochs, you may continue the CCO from the previous checkpoint, enabling you to run additional epochs without starting from scratch.
Example
import torch
from clika_compression import PyTorchCompressionEngine
from clika_compression.settings import generate_default_settings, ModelCompileSettings
from torch.utils.data import DataLoader
...
settings = generate_default_settings()
optimizer = torch.optim.AdamW(params=list(model.parameters()), lr=0.0001)
mcs = ModelCompileSettings(
optimizer=optimizer,
training_losses={'ce_loss': torch.nn.CrossEntropyLoss()},
)
engine = PyTorchCompressionEngine()
def get_train_loader() -> DataLoader:
# Create your dataset and dataloader here
return train_dataloader
# Resume the "CCO" to compress model to generate a `.pompom` file
engine.resume( # final is the path to the `.pompom` file from the latest epoch
clika_state_path='path/to/.pompom',
model_compile_settings=mcs,
init_training_dataset_fn=get_train_loader, # a function that returns the training dataloader
settings=settings, # the CCO settings
multi_gpu=True # use Multi-GPU Distributed Compression
)
Deploy a Model
Do not deploy a .onnx
file that was generated by CCO with a certain deployment setting
(for example, ONNX Runtime
) to another framework
(for example, TensorRT
).
Each onnx file created is tailored to the deployment framework set in Settings.deployment_settings
and should only be deployed to that framework.
Once the CCO has been executed, you have the option to deploy the resulting
.pompom
file to a framework of your choice by utilizing the
cc.PyTorchCompressionEngine.deploy()
method.
This method enables a straightforward deployment for the compressed model obtained from the CCO.
The current clika-compression
version supports TensorRT
and ONNX Runtime deployment (see DeploymentSettings).
CLIKA Deployment Example
from clika_compression import PyTorchCompressionEngine, DeploymentSettings_TensorRT_ONNX
from clika_compression.settings import generate_default_settings
engine = PyTorchCompressionEngine()
settings = generate_default_settings()
settings.deployment_settings = DeploymentSettings_TensorRT_ONNX(graph_author="CLIKA",
input_shapes_for_deployment=[(None, 1, 28, 28)]),
# Generate `.onnx` file and saves it to 'output'
engine.deploy(clika_state_path='path/to/.pompom', output_dir_path='output')
TensorRT Deployment
After running CCO and cc.PyTorchCompressionEngine.deploy()
a MyModel.onnx
will be generated. This file should be fed to the trtexec
command as shown below to create a .engine
file deployed to TensorRT.
To deploy your model to a .engine
file, install TensorTR
on your local machine OR use a docker container by Nvidia.
Here, we use a docker container by Nvidia. to deploy our model by running the following command (the docker image will be automatically pulled from Docker Hub):
You may also provide minimum or maximum shape bounds instead of specifying the exact shapes:
docker run --gpus all --rm -v .:/workspace/ nvcr.io/nvidia/tensorrt:23.07-py3 -c trtexec \
--onnx=outputs/MyModel.onnx \
--saveEngine=outputs/MyModel.engine \
--shapes=input_0:1x3x224x224 \
--int8 --workspace=1024
# --onnx - relative path to your model `.onnx` file
# --saveEngine - relative path to deployed model
# --shapes - input shape (make sure it compatible with the shapes selected during CLIKA deployment)
You may also provide minimum or maximum shape bounds instead of specifying the exact shapes:
docker run --gpus all --rm -v .:/workspace/ nvcr.io/nvidia/tensorrt:23.07-py3 -c trtexec \
--onnx=outputs/MyModel.onnx \
--saveEngine=MyModel.engine \
--minShapes=input_0:1x3x224x224 \
--optShapes=input_0:1x3x640x640 \
--maxShapes=input_0:1x3x1080x1080 \
--int8 --workspace=1024
# --onnx - relative path to your model `.onnx` file
# --saveEngine - relative path to deployed model
# make sure the non-constant dimentions are `None` when using `PyTorchCompressionEngine.deploy`,
# in this example: `PyTorchCompressionEngine.deploy.input_shapes=[1,3,None,None]`
For more information, see CLIKA examples and TensorRT documentation.
Multi-GPU Distributed Compression
To use "Multi-GPU Distributed Compression" with CCO, the argument multi_gpu
must
be set to True
in the cc.PyTorchCompressionEngine.optimize()
function call (used for start CCO)
or the cc.PyTorchCompressionEngine.resume()
function call (used for resume CCO).
The clika-compression
"Multi-GPU Distributed Compression" technique is similar to PyTorch's DistributedDataParallel
paradigm, which copies the model to each GPU and splits the dataset between them.
For more information, see Multi-GPU input restrictions.
CCO Outputs
The output directory tree structure will be as follows:
<output-dir>/
├── epoch_1/
│ ├── model.pompom
│ └── summary.json
├── epoch_2/
│ ├── model.pompom
│ └── summary.json
...
├── logs/
│ └──clika_optimize_<timestamp>.log
...
└── <model-name>.onnx # only generated when using `cc.PyTorchCompressionEngine.deploy()`
During CCO, each epoch (unless it was stated otherwise in the
cc.TrainingSettings.save_interval
),
a new folder will be created named epoch_<current-epoch-index>/
.
Inside the folder, the following two files will be generated:
model.pompom
The outputs of CCO are stored in.pompom
files;
they contain the compressed model and have
everything that is required to resume CCO or deploy the compressed model.
They have two main usages:
- as input to the
cc.PyTorchCompressionEngine.resume()
method to resume CCO. - as input to the
cc.PyTorchCompressionEngine.deploy()
method to create a deployable model for a specific framework.
summary.json
We will break down the summary.json
structure by using an example.
The script:
import torch
from clika_compression import PyTorchCompressionEngine
from clika_compression.settings import generate_default_settings, ModelCompileSettings
from torchmetrics.classification.accuracy import MulticlassAccuracy
...
optimizer = torch.optim.AdamW(params=list(model.parameters()), lr=0.0001)
mcs = ModelCompileSettings(
optimizer=optimizer,
training_losses={"ce_loss": torch.nn.CrossEntropyLoss()},
training_metrics={'multiclass_accuracy': MulticlassAccuracy()},
evaluation_losses={"ce_loss": torch.nn.CrossEntropyLoss()},
evaluation_metrics={'multiclass_accuracy': MulticlassAccuracy()},
)
engine = PyTorchCompressionEngine()
settings = generate_default_settings() # Can be configured after creation
# Start the "CCO" to compress model to generate a `.pompom` file
engine.optimize( # final is the path to the `.pompom` file from the latest epoch
output_path='outputs',
settings=settings, # the CCO settings
model=model, # the model to compress
model_compile_settings=mcs,
init_training_dataset_fn=get_train_loader, # a function that returns the training dataloader
init_evaluation_dataset_fn=get_evaluation_loader, # a function that returns the evaluation dataloader
)
Will generate the following summary.json
file:
{
"epoch": 1,# index of epoch
"training": {
"ce_loss": 9.7, # the cross-entropy loss
"loss": 9.7, # the total loss - a summation of all losses (in this case, since we have just one loss it's identical to cross-entropy loss)
"multiclass_accuracy": 0.0, # the metric supplied by the user
},
"evaluation": { # same as in "training" just for the evaluation dataset
"eval_ce_loss": 8.7,
"eval_loss": 8.7,
"eval_multiclass_accuracy": 0.9,
},
"time_elapsed": "0:00:00" # the duration of the CCO
}
Logs
During every run of CCO (start, resume or deploy),
logs printed to the terminal are saved to
a file in the outputs
folder.
The logs contain information about the CCO
run and can be helpful for monitoring the process and for debugging purposes.
See Output Log Breakdown for more details.
Next Steps
And that's it! Both the CCO and the deployment are flexible and highly configurable to best fit your needs.
- For further details, please refer to the Python API documentation.
- For more examples and use cases, see our Examples GitHub Repository.