modalities.optimizers package

Submodules

modalities.optimizers.lr_schedulers module

class modalities.optimizers.lr_schedulers.DummyLRScheduler(optimizer, last_epoch=-1)[source]

Bases: LRScheduler

Parameters:
get_lr()[source]

Compute learning rate using chainable form of the scheduler.

Return type:

list[float]

modalities.optimizers.optimizer_factory module

class modalities.optimizers.optimizer_factory.OptimizerFactory[source]

Bases: object

get_adam(betas, eps, weight_decay, weight_decay_groups_excluded, wrapped_model)[source]
Return type:

Optimizer

Parameters:
get_adam_w(betas, eps, weight_decay, weight_decay_groups_excluded, wrapped_model)[source]
Return type:

Optimizer

Parameters:
static get_fsdp1_checkpointed_optimizer_(checkpoint_loading, checkpoint_path, wrapped_model, optimizer)[source]

Loads an FSDP1-checkpointed optimizer from a checkpoint file.

Return type:

Optimizer

Parameters:
Args:

checkpoint_loading (FSDP1CheckpointLoadingIF): The FDSP1 checkpoint loading strategy. checkpoint_path (Path): The path to the checkpoint file. wrapped_model (FSDP1): The FSDP1 model associated with the optimizer. optimizer (Optimizer): The optimizer to load the checkpoint into.

Returns:

Optimizer: The optimizer loaded from the checkpoint.

modalities.optimizers.optimizer_factory.get_optimizer_groups(model, weight_decay, weight_decay_groups_excluded)[source]

divide model parameters into optimizer groups (with or without weight decay)

inspired by: - https://github.com/pytorch/pytorch/issues/101343 - https://github.com/karpathy/nanoGPT

Return type:

list[dict[str, list[Parameter] | float]]

Parameters:

Module contents