Version: 0.3

Configuration file

A configuration file allows the user to save and load Settings object for the CCO.

Example YAML file

# ================= CLIKA ACE hyperparameter configuration file ================= #

deployment_settings:
  # Choose the target framework ["tflite", "ov", "ort", "trt", "qnn"]
  target_framework: trt

  # (OPTIONAL) Set True if you're planning to run the model on a CPU that supports AVX512-VNNI or on an ARM Device
  #   Only applicable for "ov", "ort", "qnn"
  # weights_utilize_full_int_range: false


training_settings:
  # Number of epochs to CCO 
  num_epochs: 100

  # Gradient accumulation steps (useful for larger models that must run with a smaller batch size)
  grads_acc_steps: 1

  # Number of steps to take per epoch
  steps_per_epoch: 10000
  evaluation_steps: null

  # Number of warm-up epochs/steps to take
  lr_warmup_epochs: 1
  
  # Number of steps each epoch of the Learning Rate warm-up stage
  lr_warmup_steps_per_epoch: 500

  # Use Automatic Mixed Precision (reduce VRAM usage), choose if to use half precision automatically for the weights, FP32 for the gradients.
  # AMP dtype: [float16, bfloat16, null] 
  amp_dtype: null

  # Specify weight dtype of the model: [float16, bfloat16, null]
  # if null use default (float32)
  weights_dtype: null

  # Number of steps for initial quantization calibration
  stats_steps: 20
  
  # Use activations checkpointing that offloads the activations to CPU, 
  #This helps reduce memory requirement of the Model but may increase iteration time of compression
  activations_offloading: false
  
  # Use gradient checkpointing that offloads the activations to CPU, 
  # This helps reduce memory requirement of the Model but may increase iteration time of compression
  params_offloading: false

  # Enable gradient clipping, use null or comment to disable
  clip_grad_norm_val: null
  clip_grad_norm_type: 2.0

  # .pompom files save interval in epochs
  save_interval: null

  # Print log every x steps
  print_interval: 100

  # Printing moving average window size
  print_ma_window_size: 50

  # Reset train-loader/eval-loader between epochs
  reset_train_data: false
  reset_eval_data: true

  # Skip initial evaluation before compression
  skip_initial_eval: false

  # Random seed applied on CLIKA SDK
  random_seed: null

  # Ignoring `--ckpt` (if given) adn indicates CLIKA SDK that the model has untrained weight
  is_training_from_scratch: false


global_quantization_settings:
  method: qat
  
  # How many bits to use for the Weights for Quantization
  weights_num_bits: 8
  
  # How many bits to use for the Activation for Quantization
  activations_num_bits: 8

  # Whether to skip quantization for the tail of the model (keep it null if unsure)
  skip_tail_quantization: null
  
  # Whether to automatically skip quantization for sensitive layers (keep it true if unsure)
  #      The threshold to decide automatically whether to skip quantization for layers that are too sensitive.
  #      This will only be applied if 'automatic_skip_quantization' is True.
  #      Some tips:
  #          * For small models like MobileNet - 5000 is a good value
  #          * For big models 10000 is a good value
  #      The quantization sensitivity is measured using L2(QuantizedTensor-FloatTensor), the higher it is the more "destructive" the quantization is.
  #      This also implies that it can take longer for a Model to recover its performance if it is overly sensitive.
  automatic_skip_quantization: true
  quantization_sensitivity_threshold: null


# (OPTIONAL) Uncomment if you would like to enable LORA
# global_lora_settings:
#   rank: 2
#   alpha: 1
#   dropout_rate: 0.05


distributed_training_settings:
  # Enable multi-gpu training
  multi_gpu: false

  # Enable FSDP (use_sharding=True) if true else use DDP (use_sharding=False)
  use_sharding: false


# (OPTIONAL) Layer compression setting
# See https://docs.clika.io/docs/quantization_guide
# layer_settings:
#   conv:
#     quantization_settings:
#       weights_num_bits: 8
#       activations_num_bits: 8

Configuration schema

You can set the following parameters in the configuration file:

training_settings
- num_epochs: int
- stats_steps: int
- steps_per_epoch: OneOf [int, "null"]
- evaluation_steps: OneOf [int, "null"]
- print_interval: int
- print_ma_window_size: int
- save_interval: OneOf [int, "null"]
- reset_train_data: bool
- reset_eval_data: bool
- grads_acc_steps: OneOf [int, "null"]
- amp_dtype: OneOf ["bfloat16", "float16", bool]]
- weights_dtype: OneOf ["bfloat16", "float16", bool,"null"]
- activations_offloading: bool
- params_offloading: bool
- lr_warmup_epochs: OneOf [int, "null"]
- lr_warmup_steps_per_epoch: int
- random_seed: OneOf [int, "null"]
- skip_initial_eval: bool
- clip_grad_norm_val: OneOf [int, float, "null"]
- clip_grad_norm_type: OneOf [int, float, "null"]
- is_training_from_scratch: bool
global_quantization_settings:
- method: OneOf ["qat"]
- weights_num_bits: int
- activations_num_bits: int
- skip_tail_quantization: OneOf [bool, "null"]
- automatic_skip_quantization: bool
- quantization_sensitivity_threshold: OneOf [ bool, int, "null"]
global_lora_settings:
- rank: int
- alpha: int
- dropout_rate: float
deployment_settings:
- target_framework: OneOf ["tflite", "ov", "ort", "trt", "qnn"]
- weights_utilize_full_int_range: OneOf [bool, "null"]
distributed_training_settings:
- multi_gpu: bool
- use_sharding: bool
layer_settings:
- {LAYER_NAME}
  - [quantization_settings]
    - weights_num_bits: int
    - activations_num_bits: int
    - quantization_sensitivity_threshold: OneOf [bool, int, "null"]
    - skip_quantization: bool
    - skip_quantization_downstream: bool
    - skip_quantization_until: OneOf [str, List[str], "null"]

Saving and loading configuration files

To save a YAML file from an existing settings object you may use the Settings.save and Settings.load_from_path methods as follows:

from clika_compression import Settings

path = 'config.yml'
settings = Settings() # Default settings

# Do some modification to the settings object 
settings.training_settings.num_epochs = 10
...

settings.save(path)  # Save as a yaml file

To load a YAML file use:

from clika_compression import Settings
path = '/path/to/config.yml'
settings = Settings.load_from_path(path)

Configuration file

Example YAML file​

Configuration schema​

Saving and loading configuration files​

Example YAML file

Configuration schema

Saving and loading configuration files