modalities.training.gradient_clipping package
Submodules
modalities.training.gradient_clipping.fsdp_gradient_clipper module
- class modalities.training.gradient_clipping.fsdp_gradient_clipper.FSDP1GradientClipper(wrapped_model, max_norm, norm_type=<enum 'GradientClippingMode'>)[source]
Bases:
GradientClipperIFThe FSDP1GradientClipper class that is responsible for clipping the gradients of a model wrapped with FSDP. Follows the documentation from https://pytorch.org/docs/stable/fsdp.html#torch.distributed.fsdp.FullyShardedDataParallel.clip_grad_norm_
Initialize the FSDP1GradientClipper object.
- Args:
wrapped_model (FSDP1): The wrapped model. max_norm (float): The maximum norm value for gradient clipping. norm_type (GradientClippingMode, optional): The type of gradient clipping. Defaults to GradientClippingMode.
- Returns:
None
- Parameters:
wrapped_model (FullyShardedDataParallel)
max_norm (float)
- class modalities.training.gradient_clipping.fsdp_gradient_clipper.FSDP1LoggingOnlyGradientClipper(wrapped_model, norm_type=<enum 'GradientClippingMode'>)[source]
Bases:
GradientClipperIFThe FSDP1LoggingOnlyGradientClipper class that is responsible for logging the gradient norms without actually clipping the gradients.
Initialize the FSDP1LoggingOnlyGradientClipper.
- Args:
wrapped_model (FSDP1): The wrapped FSDP1 model. norm_type (GradientClippingMode, optional): The type of gradient clipping. Defaults to GradientClippingMode.
- Returns:
None
- Parameters:
wrapped_model (FullyShardedDataParallel)
- class modalities.training.gradient_clipping.fsdp_gradient_clipper.FSDP2GradientClipper(wrapped_model, max_norm, norm_type, device_mesh=None, error_if_nonfinite=False, foreach=None)[source]
Bases:
FSDP2LoggingOnlyGradientClipperThe FSDP2GradientClipper class that is responsible for clipping the gradients of a model wrapped with FSDP.
Initialize the FSDP2GradientClipper object.
- Args:
wrapped_model (FSDP2): The wrapped FSDP2 model. max_norm (float): The maximum norm value for gradient clipping. norm_type (GradientClippingMode): The type of gradient clipping. device_mesh (DeviceMesh, optional): The device mesh used for distributed training. Defaults to None. error_if_nonfinite (bool): if True, an error is thrown if the total
norm of the gradients from
parametersisnan,inf, or-inf. Default: False (will switch to True in the future)- foreach (bool): use the faster foreach-based implementation.
If
None, use the foreach implementation for CUDA and CPU native tensors and silently fall back to the slow implementation for other device types. Default:None
- Returns:
None
- Parameters:
wrapped_model (FSDPModule)
max_norm (float)
norm_type (GradientClippingMode)
device_mesh (DeviceMesh | None)
error_if_nonfinite (bool)
foreach (bool | None)
- class modalities.training.gradient_clipping.fsdp_gradient_clipper.FSDP2LoggingOnlyGradientClipper(wrapped_model, norm_type, device_mesh=None, error_if_nonfinite=False, foreach=None)[source]
Bases:
GradientClipperIFThe FSDP2LoggingOnlyGradientClipper class that is responsible for logging the gradient norms without actually clipping the gradients.
Initialize the FSDP2LoggingOnlyGradientClipper.
- Args:
wrapped_model (FSDP2): The wrapped FSDP2 model. norm_type (GradientClippingMode): The type of gradient clipping. device_mesh (DeviceMesh, optional): The device mesh used for distributed training. Defaults to None. error_if_nonfinite (bool): if True, an error is thrown if the total
norm of the gradients from
parametersisnan,inf, or-inf. Default: False (will switch to True in the future)- foreach (bool): use the faster foreach-based implementation.
If
None, use the foreach implementation for CUDA and CPU native tensors and silently fall back to the slow implementation for other device types. Default:None
- Returns:
None
- Parameters:
wrapped_model (FSDPModule)
norm_type (GradientClippingMode)
device_mesh (DeviceMesh | None)
error_if_nonfinite (bool)
foreach (bool | None)
- class modalities.training.gradient_clipping.fsdp_gradient_clipper.GradientClippingMode(value)[source]
Bases:
LookupEnumEnum class representing different modes of gradient clipping.
- Attributes:
P1_NORM (int): Mode for Manhattan norm based clipping. P2_NORM (int): Mode for Euclidean norm based clipping. MAX_NORM (str): Mode for maximum norm based clipping.
- MAX_NORM = 'inf'
- P1_NORM = 1
- P2_NORM = 2
modalities.training.gradient_clipping.fsdp_gradient_clipper_config module
- class modalities.training.gradient_clipping.fsdp_gradient_clipper_config.FSDP1DummyGradientClipperConfig(**data)[source]
Bases:
BaseModelConfiguration class for FSDP dummy gradient clipper.
- Args:
wrapped_model (PydanticPytorchModuleType): The wrapped PyTorch model. norm_type (GradientClippingMode): The type of gradient clipping to be applied.
- Attributes:
wrapped_model (PydanticPytorchModuleType): The wrapped PyTorch model. norm_type (GradientClippingMode): The type of gradient clipping to be applied.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
wrapped_model (Annotated[Module, <modalities.config.pydantic_if_types.PydanticThirdPartyTypeIF object at 0x7f3743ed0e90>])
norm_type (GradientClippingMode)
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
norm_type:
GradientClippingMode
- class modalities.training.gradient_clipping.fsdp_gradient_clipper_config.FSDP1GradientClipperConfig(**data)[source]
Bases:
BaseModelConfiguration class for FSDP gradient clipper.
- Args:
max_norm (float): The maximum norm value for gradient clipping. norm_type (GradientClippingMode): The type of gradient clipping to be applied. wrapped_model (PydanticPytorchModuleType): The wrapped PyTorch model.
- Attributes:
max_norm (float): The maximum norm value for gradient clipping. norm_type (GradientClippingMode): The type of gradient clipping to be applied. wrapped_model (PydanticPytorchModuleType): The wrapped PyTorch model.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
max_norm (Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Strict(strict=True), Gt(gt=0)])])
norm_type (GradientClippingMode)
wrapped_model (Annotated[Module, <modalities.config.pydantic_if_types.PydanticThirdPartyTypeIF object at 0x7f3743ed0e90>])
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
norm_type:
GradientClippingMode
- class modalities.training.gradient_clipping.fsdp_gradient_clipper_config.FSDP2DummyGradientClipperConfig(**data)[source]
Bases:
BaseModelConfiguration class for FSDP dummy gradient clipper.
- Args:
wrapped_model (PydanticPytorchModuleType): The wrapped PyTorch model. norm_type (GradientClippingMode): The type of gradient clipping to be applied. device_mesh (PydanticDeviceMeshIFType | None): The device mesh configuration.
- Attributes:
wrapped_model (PydanticPytorchModuleType): The wrapped PyTorch model. norm_type (GradientClippingMode): The type of gradient clipping to be applied. device_mesh (PydanticDeviceMeshIFType | None): The device mesh configuration.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
wrapped_model (Annotated[Module, <modalities.config.pydantic_if_types.PydanticThirdPartyTypeIF object at 0x7f3743ed0e90>])
norm_type (GradientClippingMode)
device_mesh (Annotated[DeviceMesh, <modalities.config.pydantic_if_types.PydanticThirdPartyTypeIF object at 0x7f3743ed1c10>])
-
device_mesh:
Annotated[DeviceMesh]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
norm_type:
GradientClippingMode
- class modalities.training.gradient_clipping.fsdp_gradient_clipper_config.FSDP2GradientClipperConfig(**data)[source]
Bases:
BaseModelConfiguration class for FSDP gradient clipper.
- Args:
max_norm (float): The maximum norm value for gradient clipping. norm_type (GradientClippingMode): The type of gradient clipping to be applied. wrapped_model (PydanticPytorchModuleType): The wrapped PyTorch model. device_mesh (PydanticDeviceMeshIFType | None): The device mesh configuration.
- Attributes:
max_norm (float): The maximum norm value for gradient clipping. norm_type (GradientClippingMode): The type of gradient clipping to be applied. wrapped_model (PydanticPytorchModuleType): The wrapped PyTorch model. device_mesh (PydanticDeviceMeshIFType | None): The device mesh configuration.
Create a new model by parsing and validating input data from keyword arguments.
Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.
self is explicitly positional-only to allow self as a field name.
- Parameters:
max_norm (Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Strict(strict=True), Gt(gt=0)])])
norm_type (GradientClippingMode)
wrapped_model (Annotated[Module, <modalities.config.pydantic_if_types.PydanticThirdPartyTypeIF object at 0x7f3743ed0e90>])
device_mesh (Annotated[DeviceMesh, <modalities.config.pydantic_if_types.PydanticThirdPartyTypeIF object at 0x7f3743ed1c10>])
-
device_mesh:
Annotated[DeviceMesh]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
norm_type:
GradientClippingMode