TrainingSettings

Version: 0.3

TrainingSettings

CLASS - TrainingSettings(
num_epochs: int = 50
stats_steps: int = 20
steps_per_epoch: Optional[int] = 1000
evaluation_steps: Optional[int] = None
print_interval: int = 100
print_ma_window_size: int = 20
save_interval: Optional[int] = None
reset_train_data: bool = False
reset_eval_data: bool = True
grads_acc_steps: Optional[int] = None
amp_dtype: Union[Literal['bfloat16', 'float16'], bool, None] = None
weights_dtype: Union[Literal['bfloat16', 'float16'], bool, None] = None
activations_offloading: bool = False
params_offloading: bool = False
lr_warmup_epochs: Optional[int] = 1
lr_warmup_steps_per_epoch: int = 500
random_seed: Optional[int] = int(time.time())
skip_initial_eval: bool = False
clip_grad_norm_val: Union[int, float, None] = None
clip_grad_norm_type: Union[int, float, None] = 2.0
is_training_from_scratch: bool = False
)

This Dataclass represents the Training Settings for the Compression.

Class Variables

num_epochs (int) - Number of epochs to run for
stats_steps (int) - Number of steps to run and collect statistics of the Model. Default is 20. Should not be set to more than 50 for good performance. 20 is more than enough.
steps_per_epoch (Optional[int]) - Number of steps per each epoch. If None will run until Training Dataloader is Exhausted every Epoch. Default is 1000. It is better to use small 'steps_per_epoch' since most models do not need to iterate through entire dataset.
evaluation_steps (Optional[int]) - Number of evaluation steps after each Epoch. If None will run until Evaluation Dataloader is exhausted. Default is None (will iterate through all dataset)
print_interval (int) - Interval to Print results to the Console
print_ma_window_size (int) - Number of steps to average for Loss/Metrics printed out to the Console
save_interval (Optional[int]) - Interval to save the CLIKA Model Format. Default is 1.
reset_train_data (bool) - Boolean value to indicate if to reset the Training Dataloader after every epoch
reset_eval_data (bool) - Boolean value to indicate if to reset the Evaluation Dataloader after every epoch
grads_acc_steps (Optional[int]) - For how many steps to accumulate Gradients. This helps in-case Model is too big to fit into a GPU to simulate a bigger batch. Default is None.
amp_dtype (Union[Literal['bfloat16', 'float16'], bool, None]) - Whether or not to use Mixed Precision Training in FP16 or BF16. This helps reduce memory requirement of the model but may cause Nan loss or Nan/Inf gradients for some models. Use this only if that's how you trained your model prior to compression. If True boolean value, it will default to FP16.
weights_dtype (Union[Literal['bfloat16', 'float16'], bool, None]) - Whether or not to use FP16/BF16 weights training. This helps reduce memory requirement of the model but may cause NaN loss or Nan/Inf gradients for some models. Use this only if that's how you trained your model prior to compression. If True boolean value, it will default to FP16.
activations_offloading (bool) - Whether or not to store activation values on CPU. This helps reduce memory requirement of the Model but may increase Iteration time of Compression
params_offloading (bool) - Whether or not to store parameter values on CPU. This helps reduce memory requirement of the Model but may increase Iteration time of Compression
lr_warmup_epochs (Optional[int]) - Number of epochs to run Learning Rate warmup. Can be None or 0.
lr_warmup_steps_per_epoch (int) - For how many iterations per Epoch to run Warmup. This argument is ignored in-case 'lr_warmup_epochs' is None or 0.
random_seed (Optional[int]) - Setting a Random Seed.
skip_initial_eval (bool) - Whether or not to skip the original model Evaluation for comparison later in the Compression
clip_grad_norm_val (Union[int, float, None]) - Clip Grad by value Clipping Grad Norm will be applied only if 'clip_grad_norm' is not None.
clip_grad_norm_type (Union[int, float, None]) - Type of Clip Grad Norms. Can be float('inf') for Infinity Norm. Can be 2 for L2 Norm, 1 for L1 Norm
is_training_from_scratch (bool) - Whether or not the model is training from scratch. It is not recommended at this point to use the SDK for training from scratch large models.

TrainingSettings​

Class Variables​

TrainingSettings

Class Variables