How to use CLIKA ACE
Background
The clika-ace Python package provides CLIKA's unique Automatic Compression Engine (ACE).
CLIKA ACE
- ACE is a hardware-aware engine that compresses a model for a specific target framework, such as Microsoft's ONNX Runtime, NVIDIA's TensorRT, or Intel's OpenVINO.
- ACE can be used to fine-tune a pre-trained model or train a model from scratch. However, we recommend using it for fine-tuning or quantization.
- The primary entrypoint to ACE is the
clika_compilefunction, which can also be used as a backend fortorch.compile. We recommend callingclika_compiledirectly to benefit from IDE auto-completion.
The clika-ace package has three main usages:
- Start an ACE session from an existing
torch.nn.Module. The module is wrapped in aClikaModuleinstance, which inherits from and behaves like a standardtorch.nn.Module. - Resume an ACE session initialized from saved
ClikaModule. - Export the compressed model to a target framework.
Deployment of aClikaModuleinstance to a particular output target framework is handled by theclika_model.clika_export()function, which has an API similar totorch.onnx.export.
Initializing a ClikaModule from a torch.nn.Module transforms the model into CLIKA's Intermediate Representation (IR). This IR format is also used when serializing the model.
Terminology:
- CLIKA ACE: CLIKA's Automatic Compression Engine.
- CLIKA IR /
pompom: An intermediate representation used exclusively by the CLIKA ACE SDK. The SDK parses an inputtorch.nn.Moduleand converts it into this format. ClikaModule: A wrapper object for an input model that inherits from and behaves like atorch.nn.Module.- Monolithic
ClikaModule: A singleClikaModulethat represents an entire input model. In the future, models with data-dependent control flow may be returned with multiple submodules individually wrapped byClikaModule.
Start
In order to start the ACE, we recommend using torch.compile to wrap the torch.nn.Module with a ClikaModule as detailed below.
Models compressed by ACE must not contain data-dependent control flow statements (e.g., if statements or for loops).
For additional details on these restrictions, see Data-Dependent Control-Flow.
Currently, the CLIKA SDK does not support partial compilation of submodules within a user-supplied torch.nn.Module. While a future release will support partial compilation and control flow operations, torch.compile presently returns a single, monolithic ClikaModule.
Example
import torch
import tempfile
from clika_ace import (
ClikaModule,
clika_compile,
DeploymentSettings_TensorRT_ONNX
)
import onnxruntime as ort
class Model(torch.nn.Module):
def forward(self, x):
x[..., :3] = 1.0
return x
xs = torch.rand(32, 3, 224)
clika_model: ClikaModule = clika_compile(
model=Model(),
calibration_inputs=xs,
deployment_settings=DeploymentSettings_TensorRT_ONNX(),
)
with tempfile.NamedTemporaryFile("wb") as fp:
clika_model.clika_export(
file=fp.name,
input_names=["x"]
)
session = ort.InferenceSession(fp.name)
outputs = session.run(
output_names=None,
input_feed={
"x": xs.numpy()
}
)[0]
ref_outputs = Model()(xs).numpy()
assert outputs.shape == ref_outputs.shape and np.all(outputs == ref_outputs)
Compile API
Please see Compile API
Save / Resume
Once the Model has been wrapped, it can easily be saved or loaded:
...
clika_model: ClikaModule = clika_compile(...)
# This will serialize the model into multiple chunks if needed.
clika_model.clika_save(save_path)
# This will read the chunks
restored_clika_model = ClikaModule.clika_load(save_path)
Saving is beneficial in the following scenarios:
-
Interruption of the training process:
- In the event of a mid-run crash or interruption during an ACE session, you can resume the operation from the last checkpoint.
-
Introduction of new data:
- When new data is introduced to the training process, it will allow you to resume the ACE session with a new dataset, ensuring continuity in the compression process.
-
Additional fine-tuning:
- If you wish to further fine-tune the model by running more epochs, you can continue the ACE session starting from a previous checkpoint, enabling you to run additional epochs without starting from scratch.
To do so, ClikaModule.load the previously-used ClikaModule and continue with script execution
as with a normal torch.nn.Module.
Deploy
To deploy a ClikaModule instance, use the clika_module.clika_export() function, which has an API similar to torch.onnx.export.
There are two types of deployment:
- Dynamic shape deployment:
dynamic_axesis provided. This ensures the compressed model works for varying input shapes. Ifdynamic_axesspecifies a symbolic shape for an axis (e.g.,Noneor a string), the model is deployed with dynamic shape input. - Static shape deployment:
dynamic_axesis not provided. This results in a compressed model that accepts a single, specific input shape, determined by the tensor passed to theargsargument.
Static shape deployment typically results in faster inference speed, as most inference frameworks can perform additional optimizations when all shapes are known. Additionally, some target frameworks may not support dynamically-shaped inputs.
Dynamic shape deployment may fail if a model depends on specific shapes. For example, a model with a Flatten layer followed by a Linear layer. In this case, consider using an AdaptiveAvgPool operation with an output size of 1 instead of Flatten.
Dynamic shape deployment example
clika_model.clika_export(
f=f,
input_names=["x"],
dynamic_axes={"x": {0: "batch_size"}} # we want a dynamic batch-size
)
Static shape deployment example
clika_model.clika_export(
model=clika_module,
f=f,
args=torch.rand(1, 3, 224, 224), # will create a deployed model accepting *this* shape
input_names=["x"],
dynamic_axes=None
)