DistributedTrainingSettings | CLIKA

Skip to main content

Version: 24.8.0

DistributedTrainingSettings

CLASS - DistributedTrainingSettings(
multi_gpu: bool = False
use_sharding: bool = False
)

This Dataclass holds all the information necessary to configure distributed training.

Class Variables

multi_gpu (bool) - Whether or not to use multi_gpu training for the compression.
use_sharding (bool) - Whether or not to use sharding to split the model and optimizer states across the GPUs. Can help fit bigger models but may come at a cost of increased latency. This is similar to PyTorch FSDP / DeepSpeed.

DistributedTrainingSettings
- Class Variables