Configuration
Users can save and load Settings
object
for the ACE using a YAML configuration file.
Example YAML file
# ================= CLIKA ACE hyperparameter configuration file ================= #
deployment_settings:
# Choose the target framework ["tflite", "ov", "ort", "trt", "qnn"]
# or alternatively ["tfl", "openvino", "onnxruntime", "tensorrt", "qnn"] (case insensitive)
target_framework: trt
# (OPTIONAL) Set True if you're planning to run the model on a CPU that supports AVX512-VNNI or on an ARM Device
# Only applicable for "ov", "ort"
# weights_utilize_full_int_range: false
training_settings:
# Number of steps for initial quantization calibration
stats_steps: 20
# Random seed applied on CLIKA ACE
random_seed: null
# Indicates to CLIKA ACE that the model has untrained weight
is_training_from_scratch: false
global_quantization_settings:
method: qat
# How many bits to use for the Weights for Quantization
weights_num_bits: 8
# How many bits to use for the Activation for Quantization
activations_num_bits: 8
# Whether to skip quantization for the tail of the model (keep it null if unsure)
skip_tail_quantization: null
# Whether to automatically skip quantization for sensitive layers (keep it true if unsure)
# The threshold to decide automatically whether to skip quantization for layers that are too sensitive.
# This will only be applied if 'automatic_skip_quantization' is True.
# Some tips:
# * For small models like MobileNet - 5000 is a good value
# * For big models 10000 is a good value
# The quantization sensitivity is measured using L2(QuantizedTensor-FloatTensor), the higher it is the more "destructive" the quantization is.
# This also implies that it can take longer for a Model to recover its performance if it is overly sensitive.
automatic_skip_quantization: true
quantization_sensitivity_threshold: null
# (OPTIONAL) Uncomment if you would like to enable LORA
# global_lora_settings:
# rank: 2
# alpha: 1
# dropout_rate: 0.05
distributed_training_settings:
# Enable multi-gpu training
multi_gpu: false
# Enable FSDP (use_sharding=True) if true else use DDP (use_sharding=False)
use_sharding: false
# (OPTIONAL) Layer compression setting
# See https://docs.clika.io/docs/quantization_guide
# layer_settings:
# conv:
# quantization_settings:
# weights_num_bits: 8
# activations_num_bits: 8
Advanced layer selections
By name - regular expression:
- Syntax:
{re}<pattern>
- Example:
{re}matmul.*
- Notes:
{re}
is case sensitive and can not be invoked with{RE}
.<pattern>
follows (Python) regular expression rules, and can be tested prior to use in the config file using the built-inre
Python package.
By full name - regular expression:
- Syntax:
{re-fn}<pattern>
- Example:
{re-fn}.*/encoder/.*
- Notes:
{re-fn}
is case sensitive and can not be invoked with{RE-FN}
.<pattern>
follows (Python) regular expression rules, and can be tested prior to use in the config file using the built-inre
Python package.- Full name is the
Subgraph+Name
. - This is useful when your model consists of different submodules, for example:
- Encoder-Decoder model, you can select only the Encoder part of the model
By layer type:
- Syntax:
{type}<layer_type>
- Example:
{type}conv
- Notes:
{type}
token is case sensitive, and can not be{TYPE}
<layer_type>
is case insensitive, e.g.,CONV
,conv
etc.
For example, to quantize all MatMul
, Linear
and Embedding
operations (and
nothing else), you can write the configuration file as follows:
layer_settings:
"{type}embedding":
quantization_settings:
method: qat
weights_num_bits: 8
activations_num_bits: 8
"{type}linear":
quantization_settings:
method: qat
weights_num_bits: 8
activations_num_bits: 8
"{type}matmul":
quantization_settings:
method: qat
weights_num_bits: 8
activations_num_bits: 8
global_quantization_settings:
# empty on purpose, we dont want to quantize anything other than the layers above
How to discern layer types and names
Layer types and names can be inspected by visualizing the ClikaModule
using
clika_module.clika_visualize
method call, which will generate a .svg
file.
Saving and loading configuration files
To serialize a YAML file from an existing Settings
object, use the
Settings.save
and
Settings.load_from_path
methods as follows:
from clika_ace import Settings
path = 'config.yml'
settings = Settings() # Default settings
# Do some modification to the settings object
settings.training_settings.num_epochs = 10
...
settings.save(path) # Save as a yaml file
To load a YAML file use:
from clika_ace import Settings
path = '/path/to/config.yml'
settings = Settings.load_from_path(path)
Configuration schema
The following are the modifiable parameters in a CLIKA configuration file:
training_settings
global_quantization_settings
:global_lora_settings
:deployment_settings
:target_framework
: OneOf ["tflite", "ov", "ort", "trt", "qnn"]weights_utilize_full_int_range
: OneOf [bool
, "null"]
distributed_training_settings
:layer_settings
: