modalities.checkpointing package

Subpackages

Submodules

modalities.checkpointing.checkpoint_conversion module

class modalities.checkpointing.checkpoint_conversion.CheckpointConversion(config_file_path, output_hf_checkpoint_dir)[source]

Bases: object

Class to convert a PyTorch checkpoint to a Hugging Face checkpoint.

Initializes the CheckpointConversion object.

Args:: config_file_path (Path): The path to the configuration file containing the pytorch model configuration. output_hf_checkpoint_dir (Path): The path to the output Hugging Face checkpoint directory.
Raises:: ValueError: If the config_file_path does not exist.

Parameters:

config_file_path (Path)
output_hf_checkpoint_dir (Path)

convert_pytorch_to_hf_checkpoint(prediction_key)[source]

Converts a PyTorch checkpoint to a Hugging Face checkpoint.

Return type:: HFModelAdapter
Parameters:: prediction_key (str)

Args:: prediction_key (str): The prediction key to be used in the HFModelAdapter.
Returns:: HFModelAdapter: The converted Hugging Face model adapter.

modalities.checkpointing.checkpoint_loading module

class modalities.checkpointing.checkpoint_loading.DistributedCheckpointLoadingIF[source]

Bases: ABC

Distributed checkpoint loading interface for loading PyTorch models and optimizer checkpoints.

abstractmethod load_checkpoint_(app_state, checkpoint_dir_path)[source]

Loads the distributed checkpoint from the specified directory path into the AppState.

Return type:

AppState

Parameters:

app_state (AppState)
checkpoint_dir_path (Path)

Args:: app_state (AppState): The application state with the model, optimizer and lr scheduler. checkpoint_dir_path (Path): The directory path to the distributed checkpoint.
Raises:: NotImplementedError: This abstract method is not implemented and should be overridden in a subclass.
Returns:: AppState: The application state with the loaded checkpoint.

class modalities.checkpointing.checkpoint_loading.FSDP1CheckpointLoadingIF[source]

Bases: ABC

Checkpoint loading interface for loading PyTorch models and optimizer checkpoints.

abstractmethod load_model_checkpoint(model, file_path)[source]

Loads a model checkpoint from the specified file path.

Return type:

Module

Parameters:

model (Module)
file_path (Path)

Args:: model (nn.Module): The model to load the checkpoint into. file_path (Path): The path to the checkpoint file.
Returns:: nn.Module: The loaded model with the checkpoint parameters.
Raises:: NotImplementedError: This abstract method is not implemented and should be overridden in a subclass.

abstractmethod load_optimizer_checkpoint_(optimizer, model, file_path)[source]

Loads an optimizer checkpoint from the specified file path (in-place).

Args:: optimizer (Optimizer): The optimizer to load the checkpoint into (in-place). model (nn.Module): The model associated with the optimizer. file_path (Path): The path to the checkpoint file.
Raises:: NotImplementedError: This abstract method is not implemented and should be overridden in a subclass.

Parameters:

optimizer (Optimizer)
model (Module)
file_path (Path)

modalities.checkpointing.checkpoint_saving module

class modalities.checkpointing.checkpoint_saving.CheckpointSaving(checkpoint_saving_strategy, checkpoint_saving_execution)[source]

Bases: object

Class for saving checkpoints based on a savig and execution strategy.

Initializes the CheckpointSaving object.

Args:: checkpoint_saving_strategy (CheckpointSavingStrategyIF): The strategy for saving checkpoints. checkpoint_saving_execution (CheckpointSavingExecutionABC): The execution for saving checkpoints.

Parameters:

checkpoint_saving_strategy (CheckpointSavingStrategyIF)
checkpoint_saving_execution (CheckpointSavingExecutionABC)

save_checkpoint(training_progress, evaluation_result, app_state, early_stoppping_criterion_fulfilled=False)[source]

Saves a checkpoint of the model and optimizer.

Args:: training_progress (TrainingProgress): The training progress. evaluation_result (dict[str, EvaluationResultBatch]): The evaluation result. app_state (AppState): The application state to be checkpointed. early_stoppping_criterion_fulfilled (bool, optional): Whether the early stopping criterion is fulfilled. Defaults to False.

Parameters:

training_progress (TrainingProgress)
evaluation_result (dict[str, EvaluationResultBatch])
app_state (AppState)
early_stoppping_criterion_fulfilled (bool)

modalities.checkpointing.checkpoint_saving_execution module

class modalities.checkpointing.checkpoint_saving_execution.CheckpointSavingExecutionABC[source]

Bases: ABC

Abstract class for saving PyTorch model and optimizer checkpoints.

run_checkpoint_instruction(checkpointing_instruction, training_progress, app_state)[source]

Runs the checkpoint instruction.

Args:: checkpointing_instruction (CheckpointingInstruction): The checkpointing instruction. training_progress (TrainingProgress): The training progress. app_state (AppState): The application state to be checkpointed.

Parameters:

checkpointing_instruction (CheckpointingInstruction)
training_progress (TrainingProgress)
app_state (AppState)

modalities.checkpointing.checkpoint_saving_instruction module

class modalities.checkpointing.checkpoint_saving_instruction.CheckpointingInstruction(save_current=False, checkpoints_to_delete=<factory>)[source]

Bases: object

Represents a checkpointing instruction (i.e., saving and deleting).

Attributes:: save_current (bool): Indicates whether to save the current checkpoint. checkpoints_to_delete (list[TrainingProgress]): List of checkpoint IDs to delete.

Parameters:

save_current (bool)
checkpoints_to_delete (list[TrainingProgress])

checkpoints_to_delete: list[TrainingProgress] = <dataclasses._MISSING_TYPE object>

save_current: bool = False

modalities.checkpointing.checkpoint_saving_strategies module

class modalities.checkpointing.checkpoint_saving_strategies.CheckpointSavingStrategyIF[source]

Bases: ABC

Checkpoint saving strategy interface.

abstractmethod get_checkpoint_instruction(training_progress, evaluation_result=None, early_stoppping_criterion_fulfilled=False)[source]

Returns the checkpointing instruction.

Return type:

CheckpointingInstruction

Parameters:

training_progress (TrainingProgress)
evaluation_result (dict[str, EvaluationResultBatch] | None)
early_stoppping_criterion_fulfilled (bool)

Parameters:: training_progress (TrainingProgress): The training progress. evaluation_result (dict[str, EvaluationResultBatch] | None, optional): The evaluation result. Defaults to None. early_stoppping_criterion_fulfilled (bool, optional): Whether the early stopping criterion is fulfilled. Defaults to False.
Returns:: CheckpointingInstruction: The checkpointing instruction.

class modalities.checkpointing.checkpoint_saving_strategies.SaveEveryKStepsCheckpointingStrategy(k)[source]

Bases: CheckpointSavingStrategyIF

Initializes the CheckpointSavingStrategy object.

Args:: k (int): The value of k.
Returns:: None

Parameters:: k (int)

get_checkpoint_instruction(training_progress, evaluation_result=None, early_stoppping_criterion_fulfilled=False)[source]

Returns a CheckpointingInstruction object.

Return type:

CheckpointingInstruction

Parameters:

training_progress (TrainingProgress)
evaluation_result (dict[str, EvaluationResultBatch] | None)
early_stoppping_criterion_fulfilled (bool)

Args:: training_progress (TrainingProgress): The training progress. evaluation_result (dict[str, EvaluationResultBatch] | None, optional): The evaluation result. Defaults to None. early_stoppping_criterion_fulfilled (bool, optional): Whether the early stopping criterion is fulfilled. Defaults to False.
Returns:: CheckpointingInstruction: The checkpointing instruction object.

class modalities.checkpointing.checkpoint_saving_strategies.SaveKMostRecentCheckpointsStrategy(k=-1)[source]

Bases: CheckpointSavingStrategyIF

Strategy for saving the k most recent checkpoints only.

Initializes the checkpoint saving strategy.

Args:

k (int, optional): The number of most recent checkpoints to save.: Defaults to -1, which means all checkpoints are saved. Set to 0 to not save any checkpoints. Set to a positive integer to save the specified number of checkpointsStrategy for saving the k most recent checkpoints only.

Parameters:: k (int)

get_checkpoint_instruction(training_progress, evaluation_result=None, early_stoppping_criterion_fulfilled=False)[source]

Generates a checkpointing instruction based on the given parameters.

Return type:

CheckpointingInstruction

Parameters:

training_progress (TrainingProgress)
evaluation_result (dict[str, EvaluationResultBatch] | None)
early_stoppping_criterion_fulfilled (bool)

Args:

training_progress (TrainingProgress): The training progress. evaluation_result (dict[str, EvaluationResultBatch] | None, optional):

The evaluation result. Defaults to None.

early_stoppping_criterion_fulfilled (bool, optional):: Whether the early stopping criterion is fulfilled. Defaults to False.

Returns:

CheckpointingInstruction: The generated checkpointing instruction.

modalities.checkpointing package

Subpackages

Submodules

modalities.checkpointing.checkpoint_conversion module

modalities.checkpointing.checkpoint_loading module

modalities.checkpointing.checkpoint_saving module

modalities.checkpointing.checkpoint_saving_execution module

modalities.checkpointing.checkpoint_saving_instruction module

modalities.checkpointing.checkpoint_saving_strategies module

Module contents