modalities.training.gradient_clipping package
Submodules
modalities.training.gradient_clipping.fsdp_gradient_clipper module
- class modalities.training.gradient_clipping.fsdp_gradient_clipper.DummyGradientClipper[source]
- Bases: - GradientClipperIF- The DummyGradientClipper class that does not apply gradient clipping. 
- class modalities.training.gradient_clipping.fsdp_gradient_clipper.FSDP1GradientClipper(wrapped_model, max_norm, norm_type=<enum 'GradientClippingMode'>)[source]
- Bases: - GradientClipperIF- The FSDP1GradientClipper class that is responsible for clipping the gradients of a model wrapped with FSDP. Follows the documentation from https://pytorch.org/docs/stable/fsdp.html#torch.distributed.fsdp.FullyShardedDataParallel.clip_grad_norm_ - Initialize the FSDP1GradientClipper object. - Args:
- wrapped_model (FSDP1): The wrapped model. max_norm (float): The maximum norm value for gradient clipping. norm_type (GradientClippingMode, optional): The type of gradient clipping. Defaults to GradientClippingMode. 
- Returns:
- None 
 - Parameters:
- wrapped_model (FullyShardedDataParallel) 
- max_norm (float) 
 
 
- class modalities.training.gradient_clipping.fsdp_gradient_clipper.FSDP1LoggingOnlyGradientClipper(wrapped_model, norm_type=<enum 'GradientClippingMode'>)[source]
- Bases: - GradientClipperIF- The FSDP1LoggingOnlyGradientClipper class that is responsible for logging the gradient norms without actually clipping the gradients. - Initialize the FSDP1LoggingOnlyGradientClipper. - Args:
- wrapped_model (FSDP1): The wrapped FSDP1 model. norm_type (GradientClippingMode, optional): The type of gradient clipping. Defaults to GradientClippingMode. 
- Returns:
- None 
 - Parameters:
- wrapped_model (FullyShardedDataParallel) 
 
- class modalities.training.gradient_clipping.fsdp_gradient_clipper.FSDP2GradientClipper(wrapped_model, max_norm, norm_type=<enum 'GradientClippingMode'>)[source]
- Bases: - GradientClipperIF- The FSDP2GradientClipper class that is responsible for clipping the gradients of a model wrapped with FSDP. - Initialize the FSDP2GradientClipper object. - Args:
- wrapped_model (FSDP2): The wrapped model. max_norm (float): The maximum norm value for gradient clipping. norm_type (GradientClippingMode, optional): The type of gradient clipping. Defaults to GradientClippingMode. 
- Returns:
- None 
 - Parameters:
- wrapped_model (FSDPModule) 
- max_norm (float) 
 
 - static clip_grad_norm_(parameters, max_norm, norm_type=2.0, error_if_nonfinite=False, foreach=None)[source]
- Clip the gradient norm of an iterable of parameters. - Gradient norm clipping requires computing the gradient norm over the entire model. torch.nn.utils.clip_grad_norm_ only computes gradient norm along DP/FSDP/TP dimensions. - TODO: for pipeline parallelism, we need to implement it like here: https://github.com/pytorch/torchtitan/blob/b291ad662493b63d25b038a30a915082d3617baf/torchtitan/distributed/utils.py#L245 I removed all the code w.r.t. pipeline parallelism for now. - Return type:
- Parameters:
 - Args:
- parameters: an iterable of Tensors or a single Tensor that will have gradients normalized max_norm (float): max norm of the gradients norm_type (float): type of the used p-norm. Can be - 'inf'for- infinity norm. - error_if_nonfinite (bool): if True, an error is thrown if the total
- norm of the gradients from - parametersis- nan,- inf, or- -inf. Default: False (will switch to True in the future)
- foreach (bool): use the faster foreach-based implementation.
- If - None, use the foreach implementation for CUDA and CPU native tensors and silently fall back to the slow implementation for other device types. Default:- None
 
- Returns:
- Total norm of the parameter gradients (viewed as a single vector). 
 
 
- class modalities.training.gradient_clipping.fsdp_gradient_clipper.FSDP2LoggingOnlyGradientClipper(wrapped_model, norm_type=<enum 'GradientClippingMode'>)[source]
- Bases: - GradientClipperIF- The FSDP2LoggingOnlyGradientClipper class that is responsible for logging the gradient norms without actually clipping the gradients. - Initialize the FSDP2LoggingOnlyGradientClipper. - Args:
- wrapped_model (FSDP2): The wrapped FSDP2 model. norm_type (GradientClippingMode, optional): The type of gradient clipping. Defaults to GradientClippingMode. 
- Returns:
- None 
 - Parameters:
- wrapped_model (FSDPModule) 
 
- class modalities.training.gradient_clipping.fsdp_gradient_clipper.GradientClippingMode(value)[source]
- Bases: - LookupEnum- Enum class representing different modes of gradient clipping. - Attributes:
- P1_NORM (int): Mode for Manhattan norm based clipping. P2_NORM (int): Mode for Euclidean norm based clipping. MAX_NORM (str): Mode for maximum norm based clipping. 
 - MAX_NORM = 'inf'
 - P1_NORM = 1
 - P2_NORM = 2
 
modalities.training.gradient_clipping.fsdp_gradient_clipper_config module
- class modalities.training.gradient_clipping.fsdp_gradient_clipper_config.DummyGradientClipperConfig(**data)[source]
- Bases: - BaseModel- Configuration class for dummy gradient clipper. - This class is a placeholder and does not have any specific functionality. - Attributes:
- None 
- Methods:
- None 
 - Create a new model by parsing and validating input data from keyword arguments. - Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model. - self is explicitly positional-only to allow self as a field name. - model_config: ClassVar[ConfigDict] = {}
- Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. 
 
- class modalities.training.gradient_clipping.fsdp_gradient_clipper_config.FSDPDummyGradientClipperConfig(**data)[source]
- Bases: - BaseModel- Configuration class for FSDP dummy gradient clipper. - Args:
- wrapped_model (PydanticPytorchModuleType): The wrapped PyTorch model. norm_type (GradientClippingMode): The type of gradient clipping to be applied. 
- Attributes:
- wrapped_model (PydanticPytorchModuleType): The wrapped PyTorch model. norm_type (GradientClippingMode): The type of gradient clipping to be applied. 
 - Create a new model by parsing and validating input data from keyword arguments. - Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model. - self is explicitly positional-only to allow self as a field name. - Parameters:
- wrapped_model (Annotated[Module, <modalities.config.pydantic_if_types.PydanticThirdPartyTypeIF object at 0x7f84b6c2e910>]) 
- norm_type (GradientClippingMode) 
 
 - model_config: ClassVar[ConfigDict] = {}
- Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. 
 - 
norm_type: GradientClippingMode
 
- class modalities.training.gradient_clipping.fsdp_gradient_clipper_config.FSDPGradientClipperConfig(**data)[source]
- Bases: - BaseModel- Configuration class for FSDP gradient clipper. - Args:
- max_norm (float): The maximum norm value for gradient clipping. norm_type (GradientClippingMode): The type of gradient clipping to be applied. wrapped_model (PydanticPytorchModuleType): The wrapped PyTorch model. 
- Attributes:
- max_norm (float): The maximum norm value for gradient clipping. norm_type (GradientClippingMode): The type of gradient clipping to be applied. wrapped_model (PydanticPytorchModuleType): The wrapped PyTorch model. 
 - Create a new model by parsing and validating input data from keyword arguments. - Raises [ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model. - self is explicitly positional-only to allow self as a field name. - Parameters:
- max_norm (Annotated[float, FieldInfo(annotation=NoneType, required=True, metadata=[Strict(strict=True), Gt(gt=0)])]) 
- norm_type (GradientClippingMode) 
- wrapped_model (Annotated[Module, <modalities.config.pydantic_if_types.PydanticThirdPartyTypeIF object at 0x7f84b6c2e910>]) 
 
 - model_config: ClassVar[ConfigDict] = {}
- Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict]. 
 - 
norm_type: GradientClippingMode