Skip to main content
Version: 25.12.0

How to use CLIKA ACE

Background

The clika-ace Python package provides CLIKA's unique Automatic Compression Engine (ACE).

CLIKA ACE

  • ACE is a hardware-aware engine that compresses a model for a specific target framework, such as Microsoft's ONNX Runtime, NVIDIA's TensorRT, or Intel's OpenVINO.
  • ACE can be used to fine-tune a pre-trained model or train a model from scratch. However, we recommend using it for fine-tuning or quantization.
  • The primary entrypoint to ACE is the clika_compile function, which can also be used as a backend for torch.compile. We recommend calling clika_compile directly to benefit from IDE auto-completion.

The clika-ace package has three main usages:

  • Start an ACE session from an existing torch.nn.Module. The module is wrapped in a ClikaModule instance, which inherits from and behaves like a standard torch.nn.Module.
  • Resume an ACE session initialized from saved ClikaModule.
  • Export the compressed model to a target framework.
    Deployment of a ClikaModule instance to a particular output target framework is handled by the clika_model.clika_export() function, which has an API similar to torch.onnx.export.
info

Initializing a ClikaModule from a torch.nn.Module transforms the model into CLIKA's Intermediate Representation (IR). This IR format is also used when serializing the model.

Terminology:

  • CLIKA ACE: CLIKA's Automatic Compression Engine.
  • CLIKA IR / pompom: An intermediate representation used exclusively by the CLIKA ACE SDK. The SDK parses an input torch.nn.Module and converts it into this format.
  • ClikaModule: A wrapper object for an input model that inherits from and behaves like a torch.nn.Module.
  • Monolithic ClikaModule: A single ClikaModule that represents an entire input model. In the future, models with data-dependent control flow may be returned with multiple submodules individually wrapped by ClikaModule.

Start

In order to start the ACE, we recommend using torch.compile to wrap the torch.nn.Module with a ClikaModule as detailed below.

caution

Models compressed by ACE must not contain data-dependent control flow statements (e.g., if statements or for loops). For additional details on these restrictions, see Data-Dependent Control-Flow.

Currently, the CLIKA SDK does not support partial compilation of submodules within a user-supplied torch.nn.Module. While a future release will support partial compilation and control flow operations, torch.compile presently returns a single, monolithic ClikaModule.

Example

import torch
import tempfile
from clika_ace import (
ClikaModule,
clika_compile,
DeploymentSettings_TensorRT_ONNX
)
import onnxruntime as ort

class Model(torch.nn.Module):

def forward(self, x):
x[..., :3] = 1.0
return x

xs = torch.rand(32, 3, 224)
clika_model: ClikaModule = clika_compile(
model=Model(),
calibration_inputs=xs,
deployment_settings=DeploymentSettings_TensorRT_ONNX(),
)
with tempfile.NamedTemporaryFile("wb") as fp:
clika_model.clika_export(
file=fp.name,
input_names=["x"]
)
session = ort.InferenceSession(fp.name)
outputs = session.run(
output_names=None,
input_feed={
"x": xs.numpy()
}
)[0]
ref_outputs = Model()(xs).numpy()
assert outputs.shape == ref_outputs.shape and np.all(outputs == ref_outputs)

Compile API

Please see Compile API

Save / Resume

Once the Model has been wrapped, it can easily be saved or loaded:

...
clika_model: ClikaModule = clika_compile(...)

# This will serialize the model into multiple chunks if needed.
clika_model.clika_save(save_path)

# This will read the chunks
restored_clika_model = ClikaModule.clika_load(save_path)

Saving is beneficial in the following scenarios:

  1. Interruption of the training process:

    • In the event of a mid-run crash or interruption during an ACE session, you can resume the operation from the last checkpoint.
  2. Introduction of new data:

    • When new data is introduced to the training process, it will allow you to resume the ACE session with a new dataset, ensuring continuity in the compression process.
  3. Additional fine-tuning:

    • If you wish to further fine-tune the model by running more epochs, you can continue the ACE session starting from a previous checkpoint, enabling you to run additional epochs without starting from scratch.

To do so, ClikaModule.load the previously-used ClikaModule and continue with script execution as with a normal torch.nn.Module.

Deploy

To deploy a ClikaModule instance, use the clika_module.clika_export() function, which has an API similar to torch.onnx.export.

There are two types of deployment:

  1. Dynamic shape deployment: dynamic_axes is provided. This ensures the compressed model works for varying input shapes. If dynamic_axes specifies a symbolic shape for an axis (e.g., None or a string), the model is deployed with dynamic shape input.
  2. Static shape deployment: dynamic_axes is not provided. This results in a compressed model that accepts a single, specific input shape, determined by the tensor passed to the args argument.
tip

Static shape deployment typically results in faster inference speed, as most inference frameworks can perform additional optimizations when all shapes are known. Additionally, some target frameworks may not support dynamically-shaped inputs.

caution

Dynamic shape deployment may fail if a model depends on specific shapes. For example, a model with a Flatten layer followed by a Linear layer. In this case, consider using an AdaptiveAvgPool operation with an output size of 1 instead of Flatten.

Dynamic shape deployment example

clika_model.clika_export(
f=f,
input_names=["x"],
dynamic_axes={"x": {0: "batch_size"}} # we want a dynamic batch-size
)

Static shape deployment example

clika_model.clika_export(
model=clika_module,
f=f,
args=torch.rand(1, 3, 224, 224), # will create a deployed model accepting *this* shape
input_names=["x"],
dynamic_axes=None
)