modalities.checkpointing package

Subpackages

Submodules

modalities.checkpointing.checkpoint_conversion module

class modalities.checkpointing.checkpoint_conversion.CheckpointConversion(config_file_path, output_hf_checkpoint_dir)[source]

Bases: object

Class to convert a PyTorch checkpoint to a Hugging Face checkpoint.

Initializes the CheckpointConversion object.

Args:

config_file_path (Path): The path to the configuration file containing the pytorch model configuration. output_hf_checkpoint_dir (Path): The path to the output Hugging Face checkpoint directory.

Raises:

ValueError: If the config_file_path does not exist.

Parameters:
  • config_file_path (Path)

  • output_hf_checkpoint_dir (Path)

convert_pytorch_to_hf_checkpoint(prediction_key)[source]

Converts a PyTorch checkpoint to a Hugging Face checkpoint.

Return type:

HFModelAdapter

Parameters:

prediction_key (str)

Args:

prediction_key (str): The prediction key to be used in the HFModelAdapter.

Returns:

HFModelAdapter: The converted Hugging Face model adapter.

modalities.checkpointing.checkpoint_loading module

class modalities.checkpointing.checkpoint_loading.DistributedCheckpointLoadingIF[source]

Bases: ABC

Distributed checkpoint loading interface for loading PyTorch models and optimizer checkpoints.

abstractmethod load_checkpoint_(app_state, checkpoint_dir_path)[source]

Loads the distributed checkpoint from the specified directory path into the AppState.

Return type:

AppState

Parameters:
Args:

app_state (AppState): The application state with the model, optimizer and lr scheduler. checkpoint_dir_path (Path): The directory path to the distributed checkpoint.

Raises:

NotImplementedError: This abstract method is not implemented and should be overridden in a subclass.

Returns:

AppState: The application state with the loaded checkpoint.

class modalities.checkpointing.checkpoint_loading.FSDP1CheckpointLoadingIF[source]

Bases: ABC

Checkpoint loading interface for loading PyTorch models and optimizer checkpoints.

abstractmethod load_model_checkpoint(model, file_path)[source]

Loads a model checkpoint from the specified file path.

Return type:

Module

Parameters:
Args:

model (nn.Module): The model to load the checkpoint into. file_path (Path): The path to the checkpoint file.

Returns:

nn.Module: The loaded model with the checkpoint parameters.

Raises:

NotImplementedError: This abstract method is not implemented and should be overridden in a subclass.

abstractmethod load_optimizer_checkpoint_(optimizer, model, file_path)[source]

Loads an optimizer checkpoint from the specified file path (in-place).

Args:

optimizer (Optimizer): The optimizer to load the checkpoint into (in-place). model (nn.Module): The model associated with the optimizer. file_path (Path): The path to the checkpoint file.

Raises:

NotImplementedError: This abstract method is not implemented and should be overridden in a subclass.

Parameters:

modalities.checkpointing.checkpoint_saving module

class modalities.checkpointing.checkpoint_saving.CheckpointSaving(checkpoint_saving_strategy, checkpoint_saving_execution)[source]

Bases: object

Class for saving checkpoints based on a savig and execution strategy.

Initializes the CheckpointSaving object.

Args:

checkpoint_saving_strategy (CheckpointSavingStrategyIF): The strategy for saving checkpoints. checkpoint_saving_execution (CheckpointSavingExecutionABC): The execution for saving checkpoints.

Parameters:
save_checkpoint(training_progress, evaluation_result, app_state, early_stoppping_criterion_fulfilled=False)[source]

Saves a checkpoint of the model and optimizer.

Args:

training_progress (TrainingProgress): The training progress. evaluation_result (dict[str, EvaluationResultBatch]): The evaluation result. app_state (AppState): The application state to be checkpointed. early_stoppping_criterion_fulfilled (bool, optional): Whether the early stopping criterion is fulfilled. Defaults to False.

Parameters:

modalities.checkpointing.checkpoint_saving_execution module

class modalities.checkpointing.checkpoint_saving_execution.CheckpointSavingExecutionABC[source]

Bases: ABC

Abstract class for saving PyTorch model and optimizer checkpoints.

run_checkpoint_instruction(checkpointing_instruction, training_progress, app_state)[source]

Runs the checkpoint instruction.

Args:

checkpointing_instruction (CheckpointingInstruction): The checkpointing instruction. training_progress (TrainingProgress): The training progress. app_state (AppState): The application state to be checkpointed.

Parameters:

modalities.checkpointing.checkpoint_saving_instruction module

class modalities.checkpointing.checkpoint_saving_instruction.CheckpointingInstruction(save_current=False, checkpoints_to_delete=<factory>)[source]

Bases: object

Represents a checkpointing instruction (i.e., saving and deleting).

Attributes:

save_current (bool): Indicates whether to save the current checkpoint. checkpoints_to_delete (list[TrainingProgress]): List of checkpoint IDs to delete.

Parameters:
checkpoints_to_delete: list[TrainingProgress] = <dataclasses._MISSING_TYPE object>
save_current: bool = False

modalities.checkpointing.checkpoint_saving_strategies module

class modalities.checkpointing.checkpoint_saving_strategies.CheckpointSavingStrategyIF[source]

Bases: ABC

Checkpoint saving strategy interface.

abstractmethod get_checkpoint_instruction(training_progress, evaluation_result=None, early_stoppping_criterion_fulfilled=False)[source]

Returns the checkpointing instruction.

Return type:

CheckpointingInstruction

Parameters:
Parameters:

training_progress (TrainingProgress): The training progress. evaluation_result (dict[str, EvaluationResultBatch] | None, optional): The evaluation result. Defaults to None. early_stoppping_criterion_fulfilled (bool, optional): Whether the early stopping criterion is fulfilled. Defaults to False.

Returns:

CheckpointingInstruction: The checkpointing instruction.

class modalities.checkpointing.checkpoint_saving_strategies.SaveEveryKStepsCheckpointingStrategy(k)[source]

Bases: CheckpointSavingStrategyIF

Initializes the CheckpointSavingStrategy object.

Args:

k (int): The value of k.

Returns:

None

Parameters:

k (int)

get_checkpoint_instruction(training_progress, evaluation_result=None, early_stoppping_criterion_fulfilled=False)[source]

Returns a CheckpointingInstruction object.

Return type:

CheckpointingInstruction

Parameters:
Args:

training_progress (TrainingProgress): The training progress. evaluation_result (dict[str, EvaluationResultBatch] | None, optional): The evaluation result. Defaults to None. early_stoppping_criterion_fulfilled (bool, optional): Whether the early stopping criterion is fulfilled. Defaults to False.

Returns:

CheckpointingInstruction: The checkpointing instruction object.

class modalities.checkpointing.checkpoint_saving_strategies.SaveKMostRecentCheckpointsStrategy(k=-1)[source]

Bases: CheckpointSavingStrategyIF

Strategy for saving the k most recent checkpoints only.

Initializes the checkpoint saving strategy.

Args:
k (int, optional): The number of most recent checkpoints to save.

Defaults to -1, which means all checkpoints are saved. Set to 0 to not save any checkpoints. Set to a positive integer to save the specified number of checkpointsStrategy for saving the k most recent checkpoints only.

Parameters:

k (int)

get_checkpoint_instruction(training_progress, evaluation_result=None, early_stoppping_criterion_fulfilled=False)[source]

Generates a checkpointing instruction based on the given parameters.

Return type:

CheckpointingInstruction

Parameters:
Args:

training_progress (TrainingProgress): The training progress. evaluation_result (dict[str, EvaluationResultBatch] | None, optional):

The evaluation result. Defaults to None.

early_stoppping_criterion_fulfilled (bool, optional):

Whether the early stopping criterion is fulfilled. Defaults to False.

Returns:

CheckpointingInstruction: The generated checkpointing instruction.

Module contents