DistributedTrainingSettings
CLASS - DistributedTrainingSettings(
)
This Dataclass holds all the information necessary to configure distributed training.
Class Variables
- multi_gpu (
bool
) - Whether or not to use multi_gpu training for the compression. - use_sharding (
bool
) - Whether or not to use sharding to split the model and optimizer states across the GPUs. Can help fit bigger models but may come at a cost of increased latency. This is similar to PyTorch FSDP / DeepSpeed.