modalities.running_env package
Subpackages
- modalities.running_env.fsdp package
- Submodules
- modalities.running_env.fsdp.device_mesh module
DeviceMeshConfig
DeviceMeshConfig.context_parallel_degree
DeviceMeshConfig.data_parallel_replicate_degree
DeviceMeshConfig.data_parallel_shard_degree
DeviceMeshConfig.device_type
DeviceMeshConfig.enable_loss_parallel
DeviceMeshConfig.model_config
DeviceMeshConfig.pipeline_parallel_degree
DeviceMeshConfig.tensor_parallel_degree
DeviceMeshConfig.world_size
ParallelismDegrees
get_device_mesh()
- modalities.running_env.fsdp.fsdp_auto_wrapper module
- modalities.running_env.fsdp.reducer module
- Module contents
Submodules
modalities.running_env.cuda_env module
- class modalities.running_env.cuda_env.CudaEnv(process_group_backend)[source]
Bases:
object
Context manager to set the CUDA environment for distributed training.
Initializes the CudaEnv context manager with the process group backend.
- Args:
process_group_backend (ProcessGroupBackendType): Process group backend to be used for distributed training.
- Parameters:
process_group_backend (ProcessGroupBackendType)
modalities.running_env.env_utils module
- class modalities.running_env.env_utils.FSDP2MixedPrecisionSettings(**data)[source]
Bases:
BaseModel
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
param_dtype (PyTorchDtypes)
reduce_dtype (PyTorchDtypes)
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
param_dtype:
PyTorchDtypes
-
reduce_dtype:
PyTorchDtypes
- class modalities.running_env.env_utils.MixedPrecisionSettings(value)[source]
Bases:
LookupEnum
- BF_16 = MixedPrecision(param_dtype=torch.bfloat16, reduce_dtype=torch.bfloat16, buffer_dtype=torch.bfloat16, keep_low_precision_grads=False, cast_forward_inputs=False, cast_root_forward_inputs=True, _module_classes_to_ignore=(<class 'torch.nn.modules.batchnorm._BatchNorm'>,))
- BF_16_WORKING = MixedPrecision(param_dtype=torch.float32, reduce_dtype=torch.bfloat16, buffer_dtype=torch.bfloat16, keep_low_precision_grads=False, cast_forward_inputs=False, cast_root_forward_inputs=True, _module_classes_to_ignore=(<class 'torch.nn.modules.batchnorm._BatchNorm'>,))
- FP_16 = MixedPrecision(param_dtype=torch.float16, reduce_dtype=torch.float16, buffer_dtype=torch.float16, keep_low_precision_grads=False, cast_forward_inputs=False, cast_root_forward_inputs=True, _module_classes_to_ignore=(<class 'torch.nn.modules.batchnorm._BatchNorm'>,))
- FP_32 = MixedPrecision(param_dtype=torch.float32, reduce_dtype=torch.float32, buffer_dtype=torch.float32, keep_low_precision_grads=False, cast_forward_inputs=False, cast_root_forward_inputs=True, _module_classes_to_ignore=(<class 'torch.nn.modules.batchnorm._BatchNorm'>,))
- MIXED_PRECISION_MEGATRON = MixedPrecision(param_dtype=torch.bfloat16, reduce_dtype=torch.float32, buffer_dtype=None, keep_low_precision_grads=False, cast_forward_inputs=False, cast_root_forward_inputs=True, _module_classes_to_ignore=(<class 'torch.nn.modules.batchnorm._BatchNorm'>,))
- NO_MIXED_PRECISION = None