modalities.running_env package
Subpackages
- modalities.running_env.fsdp package
- Submodules
- modalities.running_env.fsdp.device_mesh module
DeviceMeshConfigDeviceMeshConfig.context_parallel_degreeDeviceMeshConfig.data_parallel_replicate_degreeDeviceMeshConfig.data_parallel_shard_degreeDeviceMeshConfig.device_typeDeviceMeshConfig.enable_loss_parallelDeviceMeshConfig.model_configDeviceMeshConfig.pipeline_parallel_degreeDeviceMeshConfig.tensor_parallel_degreeDeviceMeshConfig.world_size
ParallelismDegreesget_device_mesh()get_mesh_for_parallelism_method()get_parallel_degree()get_parallel_rank()has_parallelism_method()
- modalities.running_env.fsdp.fsdp_auto_wrapper module
- modalities.running_env.fsdp.reducer module
- Module contents
Submodules
modalities.running_env.cuda_env module
- class modalities.running_env.cuda_env.CudaEnv(process_group_backend, timeout_s=600)[source]
Bases:
objectContext manager to set the CUDA environment for distributed training.
Initializes the CudaEnv context manager with the process group backend.
- Args:
process_group_backend (ProcessGroupBackendType): Process group backend to be used for distributed training.
- Parameters:
process_group_backend (ProcessGroupBackendType)
timeout_s (int)
modalities.running_env.env_utils module
- class modalities.running_env.env_utils.FSDP2MixedPrecisionSettings(**data)[source]
Bases:
BaseModelCreate a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
param_dtype (PyTorchDtypes)
reduce_dtype (PyTorchDtypes)
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
param_dtype:
PyTorchDtypes
-
reduce_dtype:
PyTorchDtypes
- class modalities.running_env.env_utils.MixedPrecisionSettings(value)[source]
Bases:
LookupEnum- BF_16 = MixedPrecision(param_dtype=torch.bfloat16, reduce_dtype=torch.bfloat16, buffer_dtype=torch.bfloat16, keep_low_precision_grads=False, cast_forward_inputs=False, cast_root_forward_inputs=True, _module_classes_to_ignore=(<class 'torch.nn.modules.batchnorm._BatchNorm'>,))
- BF_16_WORKING = MixedPrecision(param_dtype=torch.float32, reduce_dtype=torch.bfloat16, buffer_dtype=torch.bfloat16, keep_low_precision_grads=False, cast_forward_inputs=False, cast_root_forward_inputs=True, _module_classes_to_ignore=(<class 'torch.nn.modules.batchnorm._BatchNorm'>,))
- FP_16 = MixedPrecision(param_dtype=torch.float16, reduce_dtype=torch.float16, buffer_dtype=torch.float16, keep_low_precision_grads=False, cast_forward_inputs=False, cast_root_forward_inputs=True, _module_classes_to_ignore=(<class 'torch.nn.modules.batchnorm._BatchNorm'>,))
- FP_32 = MixedPrecision(param_dtype=torch.float32, reduce_dtype=torch.float32, buffer_dtype=torch.float32, keep_low_precision_grads=False, cast_forward_inputs=False, cast_root_forward_inputs=True, _module_classes_to_ignore=(<class 'torch.nn.modules.batchnorm._BatchNorm'>,))
- MIXED_PRECISION_MEGATRON = MixedPrecision(param_dtype=torch.bfloat16, reduce_dtype=torch.float32, buffer_dtype=None, keep_low_precision_grads=False, cast_forward_inputs=False, cast_root_forward_inputs=True, _module_classes_to_ignore=(<class 'torch.nn.modules.batchnorm._BatchNorm'>,))
- NO_MIXED_PRECISION = None