Version: 24.8.0

Configuration

Users can save and load Settings object for the ACE using a YAML configuration file.

Example YAML file

# ================= CLIKA ACE hyperparameter configuration file ================= #

deployment_settings:
  # Choose the target framework ["tflite", "ov", "ort", "trt", "qnn"]
  # or alternatively ["tfl", "openvino", "onnxruntime", "tensorrt", "qnn"] (case insensitive)
  target_framework: trt

  # (OPTIONAL) Set True if you're planning to run the model on a CPU that supports AVX512-VNNI or on an ARM Device
  #   Only applicable for "ov", "ort"
  # weights_utilize_full_int_range: false


training_settings:
  # Number of steps for initial quantization calibration
  stats_steps: 20

  # Random seed applied on CLIKA ACE
  random_seed: null

  # Indicates to CLIKA ACE that the model has untrained weight
  is_training_from_scratch: false


global_quantization_settings:
  method: qat
  
  # How many bits to use for the Weights for Quantization
  weights_num_bits: 8
  
  # How many bits to use for the Activation for Quantization
  activations_num_bits: 8

  # Whether to skip quantization for the tail of the model (keep it null if unsure)
  skip_tail_quantization: null
  
  # Whether to automatically skip quantization for sensitive layers (keep it true if unsure)
  #      The threshold to decide automatically whether to skip quantization for layers that are too sensitive.
  #      This will only be applied if 'automatic_skip_quantization' is True.
  #      Some tips:
  #          * For small models like MobileNet - 5000 is a good value
  #          * For big models 10000 is a good value
  #      The quantization sensitivity is measured using L2(QuantizedTensor-FloatTensor), the higher it is the more "destructive" the quantization is.
  #      This also implies that it can take longer for a Model to recover its performance if it is overly sensitive.
  automatic_skip_quantization: true
  quantization_sensitivity_threshold: null


# (OPTIONAL) Uncomment if you would like to enable LORA
# global_lora_settings:
#   rank: 2
#   alpha: 1
#   dropout_rate: 0.05


distributed_training_settings:
  # Enable multi-gpu training
  multi_gpu: false

  # Enable FSDP (use_sharding=True) if true else use DDP (use_sharding=False)
  use_sharding: false


# (OPTIONAL) Layer compression setting
# See https://docs.clika.io/docs/quantization_guide
# layer_settings:
#   conv:
#     quantization_settings:
#       weights_num_bits: 8
#       activations_num_bits: 8

Advanced layer selections

By name - regular expression:

Syntax: {re}<pattern>
Example: {re}matmul.*
Notes:
- {re} is case sensitive and can not be invoked with {RE}.
- <pattern> follows (Python) regular expression rules, and can be tested prior to use in the config file using the built-in re Python package.

By full name - regular expression:

Syntax: {re-fn}<pattern>
Example: {re-fn}.*/encoder/.*
Notes:
- {re-fn} is case sensitive and can not be invoked with {RE-FN}.
- <pattern> follows (Python) regular expression rules, and can be tested prior to use in the config file using the built-in re Python package.
- Full name is the Subgraph+Name.
- This is useful when your model consists of different submodules, for example:
  - Encoder-Decoder model, you can select only the Encoder part of the model

By layer type:

Syntax: {type}<layer_type>
Example: {type}conv
Notes:
- {type} token is case sensitive, and can not be {TYPE}
- <layer_type> is case insensitive, e.g., CONV, conv etc.

For example, to quantize all MatMul, Linear and Embedding operations (and nothing else), you can write the configuration file as follows:

layer_settings:
  "{type}embedding":
    quantization_settings:
      method: qat
      weights_num_bits: 8
      activations_num_bits: 8
  "{type}linear":
    quantization_settings:
      method: qat
      weights_num_bits: 8
      activations_num_bits: 8
  "{type}matmul":
    quantization_settings:
      method: qat
      weights_num_bits: 8
      activations_num_bits: 8

global_quantization_settings:
  # empty on purpose, we dont want to quantize anything other than the layers above

How to discern layer types and names

Layer types and names can be inspected by visualizing the ClikaModule using clika_module.clika_visualize method call, which will generate a .svg file.

Saving and loading configuration files

To serialize a YAML file from an existing Settings object, use the Settings.save and Settings.load_from_path methods as follows:

from clika_ace import Settings

path = 'config.yml'
settings = Settings() # Default settings

# Do some modification to the settings object 
settings.training_settings.num_epochs = 10
...

settings.save(path)  # Save as a yaml file

To load a YAML file use:

from clika_ace import Settings
path = '/path/to/config.yml'
settings = Settings.load_from_path(path)

Configuration schema

The following are the modifiable parameters in a CLIKA configuration file:

training_settings
- stats_steps: int
- random_seed: OneOf [int, "null"]
- is_training_from_scratch: bool
global_quantization_settings:
- method: OneOf ["qat"]
- weights_num_bits: int
- activations_num_bits: int
- skip_tail_quantization: OneOf [bool, "null"]
- automatic_skip_quantization: bool
- quantization_sensitivity_threshold: OneOf [ bool, int, "null"]
global_lora_settings:
- rank: int
- alpha: int
- dropout_rate: float
deployment_settings:
- target_framework: OneOf ["tflite", "ov", "ort", "trt", "qnn"]
- weights_utilize_full_int_range: OneOf [bool, "null"]
distributed_training_settings:
- multi_gpu: bool
- use_sharding: bool
layer_settings:
- {LAYER_NAME}
  - [quantization_settings]
    - weights_num_bits: int
    - activations_num_bits: int
    - quantization_sensitivity_threshold: OneOf [bool, int, "null"]
    - skip_quantization: bool
    - skip_quantization_downstream: bool
    - skip_quantization_until: OneOf [str, List[str], "null"]

Configuration

Example YAML file​

Advanced layer selections​

By name - regular expression:​

By full name - regular expression:​

By layer type:​

How to discern layer types and names​

Saving and loading configuration files​

Configuration schema​