Skip to main content
Version: Latest

Configuration file

A configuration file allows the user to save and load Settings object for the CCO. you can set the following parameters in the configuration file:

Example YAML file

# ================= CLIKA ACE hyperparameter configuration file ================= #

# Choose the target framework ["tflite", "ov", "ort", "trt", "qnn"]
target_framework: trt

# (OPTIONAL) Set True if you're planning to run the model on a CPU that supports AVX512-VNNI or on an ARM Device
# Only applicable for "ov", "ort", "qnn"
# weights_utilize_full_int_range: false

# Number of epochs to CCO
num_epochs: 100

# Gradient accumulation steps (useful for larger models that must run with a smaller batch size)
grads_acc_steps: 1

# Number of steps to take per epoch
steps_per_epoch: 10000
evaluation_steps: null

# Number of warm-up epochs/steps to take
lr_warmup_epochs: 1

# Number of steps each epoch of the Learning Rate warm-up stage
lr_warmup_steps_per_epoch: 500

# Use Automatic Mixed Precision (reduce VRAM usage), choose if to use half precision automatically for the weights, FP32 for the gradients.
# AMP dtype: [float16, bfloat16, null]
amp_dtype: null

# Specify weight dtype of the model: [float16, bfloat16, null]
# if null use default (float32)
weights_dtype: null

# Number of steps for initial quantization calibration
stats_steps: 20

# Use activations checkpointing that offloads the activations to CPU,
#This helps reduce memory requirement of the Model but may increase iteration time of compression
activations_offloading: false

# Use gradient checkpointing that offloads the activations to CPU,
# This helps reduce memory requirement of the Model but may increase iteration time of compression
params_offloading: false

# Enable gradient clipping, use null or comment to disable
clip_grad_norm_val: null
clip_grad_norm_type: 2.0

# .pompom files save interval in epochs
save_interval: null

# Print log every x steps
print_interval: 100

# Printing moving average window size
print_ma_window_size: 50

# Reset train-loader/eval-loader between epochs
reset_train_data: false
reset_eval_data: true

# Skip initial evaluation before compression
skip_initial_eval: false

# Random seed applied on CLIKA SDK
random_seed: null

# Ignoring `--ckpt` (if given) adn indicates CLIKA SDK that the model has untrained weight
is_training_from_scratch: false

method: qat

# How many bits to use for the Weights for Quantization
weights_num_bits: 8

# How many bits to use for the Activation for Quantization
activations_num_bits: 8

# Whether to skip quantization for the tail of the model (keep it null if unsure)
skip_tail_quantization: null

# Whether to automatically skip quantization for sensitive layers (keep it true if unsure)
# The threshold to decide automatically whether to skip quantization for layers that are too sensitive.
# This will only be applied if 'automatic_skip_quantization' is True.
# Some tips:
# * For small models like MobileNet - 5000 is a good value
# * For big models 10000 is a good value
# The quantization sensitivity is measured using L2(QuantizedTensor-FloatTensor), the higher it is the more "destructive" the quantization is.
# This also implies that it can take longer for a Model to recover its performance if it is overly sensitive.
automatic_skip_quantization: true
quantization_sensitivity_threshold: null

# (OPTIONAL) Uncomment if you would like to enable LORA
# global_lora_settings:
# rank: 2
# alpha: 1
# dropout_rate: 0.05

# Enable multi-gpu training
multi_gpu: false

# Enable FSDP (use_sharding=True) if true else use DDP (use_sharding=False)
use_sharding: false

# (OPTIONAL) Layer compression setting
# See http://localhost:3000/docs/quantization_guide
# layer_settings:
# conv:
# quantization_settings:
# weights_num_bits: 8
# activations_num_bits: 8

Configuration schema

  • training_settings
    • num_epochs: int
    • stats_steps: int
    • steps_per_epoch: OneOf [int, "null"]
    • evaluation_steps: OneOf [int, "null"]
    • print_interval: int
    • print_ma_window_size: int
    • save_interval: OneOf [int, "null"]
    • reset_train_data: bool
    • reset_eval_data: bool
    • grads_acc_steps: OneOf [int, "null"]
    • amp_dtype: OneOf ["bfloat16", "float16", bool]]
    • weights_dtype: OneOf ["bfloat16", "float16", bool,"null"]
    • activations_offloading: bool
    • params_offloading: bool
    • lr_warmup_epochs: OneOf [int, "null"]
    • lr_warmup_steps_per_epoch: int
    • random_seed: OneOf [int, "null"]
    • skip_initial_eval: bool
    • clip_grad_norm_val: OneOf [int, float, "null"]
    • clip_grad_norm_type: OneOf [int, float, "null"]
    • is_training_from_scratch: bool
  • global_quantization_settings:
    • method: OneOf ["qat"]
    • weights_num_bits: int
    • activations_num_bits: int
    • skip_tail_quantization: OneOf [bool, "null"]
    • automatic_skip_quantization: bool
    • quantization_sensitivity_threshold: OneOf [ bool, int, "null"]
  • global_lora_settings:
  • deployment_settings:
    • target_framework: OneOf ["tflite", "ov", "ort", "trt", "qnn"]
    • weights_utilize_full_int_range: OneOf [bool, "null"]
  • distributed_training_settings:
  • layer_settings:
    • {LAYER_NAME}
      • [quantization_settings]
        • weights_num_bits: int
        • activations_num_bits: int
        • quantization_sensitivity_threshold: OneOf [bool, int, "null"]
        • skip_quantization: bool
        • skip_quantization_downstream: bool
        • skip_quantization_until: OneOf [str, List[str], "null"]

Saving and loading configuration files

To save a YAML file from an existing settings object:

from clika_compression import Settings

path = 'config.yml'
settings = Settings() # Default settings

# Do some modification to the settings object
settings.training_settings.num_epochs = 10
... # Save as a yaml file

To load a YAML file use:

from clika_compression.settings import Settings
path = '/path/to/config.yml'
settings = Settings.load_from_path(path)