Skip to main content
Version: 0.3

TrainingSettings

CLASS - TrainingSettings(

)

This Dataclass represents the Training Settings for the Compression.

Class Variables

  • num_epochs (int) - Number of epochs to run for
  • stats_steps (int) - Number of steps to run and collect statistics of the Model. Default is 20. Should not be set to more than 50 for good performance. 20 is more than enough.
  • steps_per_epoch (Optional[int]) - Number of steps per each epoch. If None will run until Training Dataloader is Exhausted every Epoch. Default is 1000. It is better to use small 'steps_per_epoch' since most models do not need to iterate through entire dataset.
  • evaluation_steps (Optional[int]) - Number of evaluation steps after each Epoch. If None will run until Evaluation Dataloader is exhausted. Default is None (will iterate through all dataset)
  • print_interval (int) - Interval to Print results to the Console
  • print_ma_window_size (int) - Number of steps to average for Loss/Metrics printed out to the Console
  • save_interval (Optional[int]) - Interval to save the CLIKA Model Format. Default is 1.
  • reset_train_data (bool) - Boolean value to indicate if to reset the Training Dataloader after every epoch
  • reset_eval_data (bool) - Boolean value to indicate if to reset the Evaluation Dataloader after every epoch
  • grads_acc_steps (Optional[int]) - For how many steps to accumulate Gradients. This helps in-case Model is too big to fit into a GPU to simulate a bigger batch. Default is None.
  • amp_dtype (Union[Literal['bfloat16', 'float16']boolNone]) - Whether or not to use Mixed Precision Training in FP16 or BF16. This helps reduce memory requirement of the model but may cause Nan loss or Nan/Inf gradients for some models. Use this only if that's how you trained your model prior to compression. If True boolean value, it will default to FP16.
  • weights_dtype (Union[Literal['bfloat16', 'float16']boolNone]) - Whether or not to use FP16/BF16 weights training. This helps reduce memory requirement of the model but may cause NaN loss or Nan/Inf gradients for some models. Use this only if that's how you trained your model prior to compression. If True boolean value, it will default to FP16.
  • activations_offloading (bool) - Whether or not to store activation values on CPU. This helps reduce memory requirement of the Model but may increase Iteration time of Compression
  • params_offloading (bool) - Whether or not to store parameter values on CPU. This helps reduce memory requirement of the Model but may increase Iteration time of Compression
  • lr_warmup_epochs (Optional[int]) - Number of epochs to run Learning Rate warmup. Can be None or 0.
  • lr_warmup_steps_per_epoch (int) - For how many iterations per Epoch to run Warmup. This argument is ignored in-case 'lr_warmup_epochs' is None or 0.
  • random_seed (Optional[int]) - Setting a Random Seed.
  • skip_initial_eval (bool) - Whether or not to skip the original model Evaluation for comparison later in the Compression
  • clip_grad_norm_val (Union[intfloatNone]) - Clip Grad by value Clipping Grad Norm will be applied only if 'clip_grad_norm' is not None.
  • clip_grad_norm_type (Union[intfloatNone]) - Type of Clip Grad Norms. Can be float('inf') for Infinity Norm. Can be 2 for L2 Norm, 1 for L1 Norm
  • is_training_from_scratch (bool) - Whether or not the model is training from scratch. It is not recommended at this point to use the SDK for training from scratch large models.