modalities package
Subpackages
- modalities.checkpointing package
- Subpackages
- Submodules
- modalities.checkpointing.checkpoint_conversion module
- modalities.checkpointing.checkpoint_loading module
- modalities.checkpointing.checkpoint_saving module
- modalities.checkpointing.checkpoint_saving_execution module
- modalities.checkpointing.checkpoint_saving_instruction module
- modalities.checkpointing.checkpoint_saving_strategies module
- Module contents
- modalities.config package
- Submodules
- modalities.config.component_factory module
- modalities.config.config module
ActivationCheckpointedModelConfig
AdamOptimizerConfig
AdamWOptimizerConfig
BatchSamplerConfig
CLMCrossEntropyLossConfig
CheckpointSavingConfig
CombinedDatasetConfig
CompiledModelConfig
ConstantLRSchedulerConfig
CosineAnnealingLRSchedulerConfig
DCPAppStateConfig
DCPCheckpointLoadingConfig
DCPCheckpointSavingConfig
DistributedSamplerConfig
DummyLRSchedulerConfig
DummyProgressSubscriberConfig
DummyResultSubscriberConfig
FSDP1CheckpointLoadingConfig
FSDP1CheckpointLoadingConfig.block_names
FSDP1CheckpointLoadingConfig.global_rank
FSDP1CheckpointLoadingConfig.mixed_precision_settings
FSDP1CheckpointLoadingConfig.model_config
FSDP1CheckpointLoadingConfig.parse_mixed_precision_setting_by_name()
FSDP1CheckpointLoadingConfig.parse_sharding_strategy_by_name()
FSDP1CheckpointLoadingConfig.sharding_strategy
FSDP1CheckpointSavingConfig
FSDP1CheckpointedModelConfig
FSDP1CheckpointedOptimizerConfig
FSDP2WrappedModelConfig
FSDP2WrappedModelConfig.block_names
FSDP2WrappedModelConfig.device_mesh
FSDP2WrappedModelConfig.mixed_precision_settings
FSDP2WrappedModelConfig.model
FSDP2WrappedModelConfig.model_config
FSDP2WrappedModelConfig.reshard_after_forward
FSDP2WrappedModelConfig.validate_dp_mesh_existence()
FSDP2WrappedModelConfig.validate_mixed_precision_settings()
FSDPWrappedModelConfig
FSDPWrappedModelConfig.block_names
FSDPWrappedModelConfig.mixed_precision_settings
FSDPWrappedModelConfig.model
FSDPWrappedModelConfig.model_config
FSDPWrappedModelConfig.parse_mixed_precision_setting_by_name()
FSDPWrappedModelConfig.parse_sharding_strategy_by_name()
FSDPWrappedModelConfig.sharding_strategy
FSDPWrappedModelConfig.sync_module_states
GPT2LLMCollateFnConfig
GPT2MFUCalculatorConfig
LLMDataLoaderConfig
LinearLRSchedulerConfig
MemMapDatasetConfig
OneCycleLRSchedulerConfig
OneCycleLRSchedulerConfig.anneal_strategy
OneCycleLRSchedulerConfig.base_momentum
OneCycleLRSchedulerConfig.check_totals_steps_and_epchs()
OneCycleLRSchedulerConfig.cycle_momentum
OneCycleLRSchedulerConfig.div_factor
OneCycleLRSchedulerConfig.epochs
OneCycleLRSchedulerConfig.final_div_factor
OneCycleLRSchedulerConfig.last_epoch
OneCycleLRSchedulerConfig.max_lr
OneCycleLRSchedulerConfig.max_momentum
OneCycleLRSchedulerConfig.model_config
OneCycleLRSchedulerConfig.optimizer
OneCycleLRSchedulerConfig.pct_start
OneCycleLRSchedulerConfig.steps_per_epoch
OneCycleLRSchedulerConfig.three_phase
OneCycleLRSchedulerConfig.total_steps
PackedMemMapDatasetContinuousConfig
PackedMemMapDatasetMegatronConfig
PassType
PreTrainedHFTokenizerConfig
PreTrainedSPTokenizerConfig
PrecisionEnum
ProcessGroupBackendType
RawAppStateConfig
ReferenceConfig
ResumableDistributedSamplerConfig
ResumableDistributedSamplerConfig.dataset
ResumableDistributedSamplerConfig.drop_last
ResumableDistributedSamplerConfig.epoch
ResumableDistributedSamplerConfig.model_config
ResumableDistributedSamplerConfig.num_replicas
ResumableDistributedSamplerConfig.rank
ResumableDistributedSamplerConfig.seed
ResumableDistributedSamplerConfig.shuffle
ResumableDistributedSamplerConfig.skip_num_global_samples
RichProgressSubscriberConfig
RichResultSubscriberConfig
SaveEveryKStepsCheckpointingStrategyConfig
SaveKMostRecentCheckpointsStrategyConfig
SequentialSamplerConfig
StepLRSchedulerConfig
TokenizerTypes
TorchCheckpointLoadingConfig
WandBEvaluationResultSubscriberConfig
WandBEvaluationResultSubscriberConfig.config_file_path
WandBEvaluationResultSubscriberConfig.directory
WandBEvaluationResultSubscriberConfig.experiment_id
WandBEvaluationResultSubscriberConfig.global_rank
WandBEvaluationResultSubscriberConfig.mode
WandBEvaluationResultSubscriberConfig.model_config
WandBEvaluationResultSubscriberConfig.project
WandbMode
WeightInitializedModelConfig
load_app_config_dict()
- modalities.config.instantiation_models module
ConsistencyEnforcement
CudaEnvSettings
Intervals
PackedDatasetComponentsInstantiationModel
StepProfile
TextGenerationInstantiationModel
TrainingComponentsInstantiationModel
TrainingComponentsInstantiationModel.Settings
TrainingComponentsInstantiationModel.app_state
TrainingComponentsInstantiationModel.checkpoint_saving
TrainingComponentsInstantiationModel.eval_dataloaders
TrainingComponentsInstantiationModel.evaluation_subscriber
TrainingComponentsInstantiationModel.gradient_clipper
TrainingComponentsInstantiationModel.loss_fn
TrainingComponentsInstantiationModel.mfu_calculator
TrainingComponentsInstantiationModel.model_config
TrainingComponentsInstantiationModel.model_raw
TrainingComponentsInstantiationModel.progress_subscriber
TrainingComponentsInstantiationModel.settings
TrainingComponentsInstantiationModel.train_dataloader
TrainingComponentsInstantiationModel.train_dataset
TrainingProgress
TrainingReportGenerator
TrainingTarget
- modalities.config.lookup_enum module
- modalities.config.pydantic_if_types module
- modalities.config.utils module
- Module contents
- modalities.conversion package
- Subpackages
- modalities.conversion.gpt2 package
- Submodules
- modalities.conversion.gpt2.configuration_gpt2 module
- modalities.conversion.gpt2.conversion_code module
- modalities.conversion.gpt2.conversion_model module
- modalities.conversion.gpt2.conversion_tokenizer module
- modalities.conversion.gpt2.convert_gpt2 module
- modalities.conversion.gpt2.modeling_gpt2 module
- Module contents
- modalities.conversion.gpt2 package
- Module contents
- Subpackages
- modalities.dataloader package
- Subpackages
- Submodules
- modalities.dataloader.create_index module
- modalities.dataloader.create_packed_data module
- modalities.dataloader.dataloader module
- modalities.dataloader.dataloader_factory module
- modalities.dataloader.dataset module
CombinedDataset
Dataset
DummyDataset
DummyDatasetConfig
DummySampleConfig
DummySampleDataType
MemMapDataset
PackedMemMapDatasetBase
PackedMemMapDatasetBase.DATA_SECTION_LENGTH_IN_BYTES
PackedMemMapDatasetBase.HEADER_SIZE_IN_BYTES
PackedMemMapDatasetBase.TOKEN_SIZE_DESCRIPTOR_LENGTH_IN_BYTES
PackedMemMapDatasetBase.np_dtype_of_tokens_on_disk_from_bytes
PackedMemMapDatasetBase.token_size_in_bytes
PackedMemMapDatasetBase.type_converter_for_torch
PackedMemMapDatasetContinuous
PackedMemMapDatasetMegatron
- modalities.dataloader.dataset_factory module
- modalities.dataloader.large_file_lines_reader module
- modalities.dataloader.samplers module
- Module contents
- modalities.inference package
- modalities.logging_broker package
- modalities.models package
- Subpackages
- Submodules
- modalities.models.model module
- modalities.models.model_factory module
- modalities.models.utils module
- Module contents
- modalities.nn package
- modalities.optimizers package
- modalities.preprocessing package
- modalities.registry package
- modalities.running_env package
- modalities.tokenization package
- modalities.training package
- Subpackages
- Submodules
- modalities.training.activation_checkpointing module
- modalities.training.training_progress module
TrainingProgress
TrainingProgress.num_seen_steps_current_run
TrainingProgress.num_seen_steps_previous_run
TrainingProgress.num_seen_steps_total
TrainingProgress.num_seen_tokens_current_run
TrainingProgress.num_seen_tokens_previous_run
TrainingProgress.num_seen_tokens_total
TrainingProgress.num_target_steps
TrainingProgress.num_target_tokens
- Module contents
- modalities.utils package
- Submodules
- modalities.utils.logging module
- modalities.utils.mfu module
- modalities.utils.number_conversion module
LocalNumBatchesFromNumSamplesConfig
LocalNumBatchesFromNumTokensConfig
NumSamplesFromNumTokensConfig
NumStepsFromNumSamplesConfig
NumStepsFromNumTokensConfig
NumStepsFromRawDatasetIndexConfig
NumTokensFromNumStepsConfig
NumTokensFromPackedMemMapDatasetContinuousConfig
NumTokensFromPackedMemMapDatasetContinuousConfig.dataset_path
NumTokensFromPackedMemMapDatasetContinuousConfig.gradient_accumulation_steps
NumTokensFromPackedMemMapDatasetContinuousConfig.local_micro_batch_size
NumTokensFromPackedMemMapDatasetContinuousConfig.model_config
NumTokensFromPackedMemMapDatasetContinuousConfig.num_ranks
NumTokensFromPackedMemMapDatasetContinuousConfig.sequence_length
NumberConversion
NumberConversion.get_global_num_seen_tokens_from_checkpoint_path()
NumberConversion.get_global_num_target_tokens_from_checkpoint_path()
NumberConversion.get_last_step_from_checkpoint_path()
NumberConversion.get_local_num_batches_from_num_samples()
NumberConversion.get_local_num_batches_from_num_tokens()
NumberConversion.get_num_samples_from_num_tokens()
NumberConversion.get_num_seen_steps_from_checkpoint_path()
NumberConversion.get_num_steps_from_num_samples()
NumberConversion.get_num_steps_from_num_tokens()
NumberConversion.get_num_steps_from_raw_dataset_index()
NumberConversion.get_num_target_steps_from_checkpoint_path()
NumberConversion.get_num_tokens_from_num_steps()
NumberConversion.get_num_tokens_from_packed_mem_map_dataset_continuous()
NumberConversionFromCheckpointPathConfig
- modalities.utils.seeding module
- modalities.utils.typing module
- modalities.utils.verify_tokenization_consistency module
- Module contents
Submodules
modalities.api module
- class modalities.api.FileExistencePolicy(value)[source]
Bases:
Enum
- ERROR = 'error'
- OVERRIDE = 'override'
- SKIP = 'skip'
- modalities.api.convert_pytorch_to_hf_checkpoint(config_file_path, output_hf_checkpoint_dir, prediction_key)[source]
Converts a PyTorch checkpoint to a Hugging Face checkpoint.
- Return type:
- Parameters:
- Args:
config_file_path (Path): Path to the config that generated the pytorch checkpoint. output_hf_checkpoint_dir (Path): Path to the output directory for the converted HF checkpoint. prediction_key (str): The key in the models output where one can find the predictions of interest.
- Returns:
HFModelAdapter: The Hugging Face model adapter.
- modalities.api.create_filtered_tokenized_dataset(input_data_path, filter_routine, output_data_path, file_existence_policy)[source]
- modalities.api.create_raw_data_index(src_path, index_path, file_existence_policy=FileExistencePolicy.ERROR)[source]
Creates the index file for the content of a large jsonl-file. The index file contains the byte-offsets and lengths of each line in the jsonl-file. Background is the ability to further process the respective file without loading it, while splitting its content line-based. This step is necessary in advance of further processing like tokenization. It is only necessary once for a jsonl-file and allows therefore different tokenizations without re-indexing.
- Args:
src_path (Path): The path to the jsonl-file. index_path (Path): The path to the index file, that will be created. file_existence_policy (FileExistencePolicy): Policy to apply when the index file already exists.
Defaults to FileExistencePolicy.ERROR.
- Raises:
ValueError: If the index file already exists.
- Parameters:
src_path (Path)
index_path (Path)
file_existence_policy (FileExistencePolicy)
- modalities.api.create_shuffled_dataset_chunk(file_path_list, output_chunk_file_path, chunk_id, num_chunks, file_existence_policy, global_seed=None)[source]
Creates a shuffled dataset chunk. Given a dataset consisting of multiple tokenized pbin files, this function creates a shuffled dataset chunk for a given chunk id. From each tokenized pbin file, the respective chunk is extracted, shuffled and written to a new pbin file.
- Args:
file_path_list (list[Path]): List of paths to the tokenized input pbin files. output_chunk_file_path (Path): Path to the output chunk which will be stored in pbin format. chunk_id (int): The id of the chunk to create. num_chunks (int): The total number of chunks to create. file_existence_policy (FileExistencePolicy): Policy to apply when the output chunk file already exists. global_seed (Optional[int]): The global seed to use for shuffling.
- Raises:
ValueError: If the chunk has no samples.
- modalities.api.create_shuffled_jsonl_dataset_chunk(file_path_list, output_chunk_file_path, chunk_id, num_chunks, file_existence_policy, global_seed=None)[source]
Creates a shuffled jsonl dataset chunk. Given a dataset consisting of multiple jsonl files, this function creates a shuffled dataset chunk for a given chunk id. From each jsonl file, the respective chunk is extracted, shuffled and written to a new jsonl file.
- Args:
file_path_list (list[Path]): List of paths to the input jsonl files. output_chunk_file_path (Path): Path to the output chunk which will be stored in jsonl format. chunk_id (int): The id of the chunk to create. num_chunks (int): The total number of chunks to create. file_existence_policy (FileExistencePolicy): Policy to apply when the output chunk file already exists. global_seed (Optional[int]): The global seed to use for shuffling.
- Raises:
ValueError: If the chunk has no samples.
- modalities.api.enforce_file_existence_policy(file_path, file_existence_policy)[source]
Enforces the file existence policy. Function returns True, if processing should be stopped. Otherwise False.
- Return type:
- Parameters:
file_path (Path)
file_existence_policy (FileExistencePolicy)
- Args:
file_path (Path): File path to the file to check. file_existence_policy (FileExistencePolicy): The file existence policy.
- Raises:
ValueError: Raised if the file existence policy is unknown or the policy requires to raise a ValueError.
- Returns:
bool: True if processing should be stopped, otherwise False.
- modalities.api.generate_text(config_file_path)[source]
Inference function to generate text with a given model.
- Args:
config_file_path (FilePath): Path to the YAML config file.
- Parameters:
config_file_path (Annotated[Path, PathType(path_type=file)])
- modalities.api.merge_packed_data_files(src_paths, target_path)[source]
Utility function for merging different pbin-files into one. This is especially useful, if different datasets were at different points in time or if one encoding takes so long, that the overall process was done in chunks. It is important that the same tokenizer got used for all chunks.
Specify an arbitrary amount of pbin-files and/or directory containing such as input.
- Args:
src_paths (list[Path]): List of paths to the pbin-files or directories containing such. target_path (Path): The path to the merged pbin-file, that will be created.
- modalities.api.pack_encoded_data(config_dict, file_existence_policy)[source]
Packs and encodes an indexed, large jsonl-file. (see also create_index for more information) Returns .pbin-file, which can be inserted into a training process directly and does not require its original jsonl-file or the respective index file anymore.
- Args:
config_dict (dict): Dictionary containing the configuration for the packed data generation. file_existence_policy (FileExistencePolicy): Policy to apply when the output file already exists.
- Parameters:
config_dict (dict)
file_existence_policy (FileExistencePolicy)
- modalities.api.shuffle_jsonl_data(input_data_path, output_data_path, file_existence_policy, seed=None)[source]
Shuffles a JSONL file (.jsonl) and stores it on disc.
- Args:
input_data_path (Path): File path to the jsonl data (.jsonl). output_data_path (Path): File path to write the shuffled jsonl data. file_existence_policy (FileExistencePolicy): Policy to apply when the output file already exists. seed (Optional[int]): The seed to use for shuffling.
- Parameters:
input_data_path (Path)
output_data_path (Path)
file_existence_policy (FileExistencePolicy)
seed (int | None)
- modalities.api.shuffle_tokenized_data(input_data_path, output_data_path, batch_size, file_existence_policy, seed=None)[source]
Shuffles a tokenized file (.pbin) and stores it on disc.
- Args:
input_data_path (Path): File path to the tokenized data (.pbin). output_data_path (Path): File path to write the shuffled tokenized data. batch_size (int): Number of documents to process per batch. file_existence_policy (FileExistencePolicy): Policy to apply when the output file already exists. seed (Optional[int]): The seed to use for shuffling.
- Parameters:
input_data_path (Path)
output_data_path (Path)
batch_size (int)
file_existence_policy (FileExistencePolicy)
seed (int | None)
modalities.batch module
- class modalities.batch.Batch[source]
Bases:
ABC
Abstract class that defines the necessary methods any Batch implementation needs to implement.
- class modalities.batch.DatasetBatch(samples, targets, batch_dim=0)[source]
Bases:
Batch
,TorchDeviceMixin
A batch of samples and its targets. Used to batch train a model.
- class modalities.batch.EvaluationResultBatch(dataloader_tag, num_train_steps_done, losses=<factory>, metrics=<factory>, throughput_metrics=<factory>)[source]
Bases:
Batch
Data class for storing the results of a single or multiple batches. Also entire epoch results are stored in here.
- Parameters:
dataloader_tag (str)
num_train_steps_done (int)
losses (dict[str, ResultItem])
metrics (dict[str, ResultItem])
throughput_metrics (dict[str, ResultItem])
-
losses:
dict
[str
,ResultItem
] = <dataclasses._MISSING_TYPE object>
-
metrics:
dict
[str
,ResultItem
] = <dataclasses._MISSING_TYPE object>
-
throughput_metrics:
dict
[str
,ResultItem
] = <dataclasses._MISSING_TYPE object>
- class modalities.batch.InferenceResultBatch(targets, predictions, batch_dim=0)[source]
Bases:
Batch
,TorchDeviceMixin
Stores targets and predictions of an entire batch.
modalities.evaluator module
- class modalities.evaluator.Evaluator(progress_publisher, evaluation_result_publisher)[source]
Bases:
object
Evaluator class which is responsible for evaluating the model on a set of datasets
Initializes the Evaluator class.
- Args:
progress_publisher (MessagePublisher[ProgressUpdate]): Publisher for progress updates evaluation_result_publisher (MessagePublisher[EvaluationResultBatch]): Publisher for evaluation results
- Parameters:
progress_publisher (MessagePublisher[ProgressUpdate])
evaluation_result_publisher (MessagePublisher[EvaluationResultBatch])
- evaluate(model, data_loaders, loss_fun, num_train_steps_done)[source]
Evaluate the model on a set of datasets.
- Return type:
- Parameters:
model (Module)
data_loaders (list[LLMDataLoader])
loss_fun (Callable[[InferenceResultBatch], Tensor])
num_train_steps_done (int)
- Args:
model (nn.Module): The model to evaluate data_loaders (list[LLMDataLoader]): List of dataloaders to evaluate the model on loss_fun (Callable[[InferenceResultBatch], torch.Tensor]): The loss function to calculate the loss num_train_steps_done (int): The number of training steps done so far for logging purposes
- Returns:
dict[str, EvaluationResultBatch]: A dictionary containing the evaluation results for each dataloader
- evaluate_batch(batch, model, loss_fun)[source]
Evaluate a single batch by forwarding it through the model and calculating the loss.
- Return type:
- Parameters:
batch (DatasetBatch)
model (Module)
loss_fun (Callable[[InferenceResultBatch], Tensor])
- Args:
batch (DatasetBatch): The batch to evaluate model (nn.Module): The model to evaluate loss_fun (Callable[[InferenceResultBatch], torch.Tensor]): The loss function to calculate the loss
- Returns:
torch.Tensor: The loss of the batch
modalities.exceptions module
modalities.gym module
- class modalities.gym.Gym(trainer, evaluator, loss_fun, num_ranks)[source]
Bases:
object
Class to perform the model training, including evaluation and checkpointing.
Initializes a Gym object.
- Args:
trainer (Trainer): Trainer object to perform the training. evaluator (Evaluator): Evaluator object to perform the evaluation. loss_fun (Loss): Loss function applied during training and evaluation. num_ranks (int): Number of ranks used for distributed training.
- run(app_state, training_log_interval_in_steps, checkpointing_interval_in_steps, evaluation_interval_in_steps, train_data_loader, evaluation_data_loaders, checkpoint_saving)[source]
Runs the model training, including evaluation and checkpointing.
- Args:
app_state (AppState): Application state containing the model, optimizer and lr scheduler. training_log_interval_in_steps (int): Interval in steps to log training progress. checkpointing_interval_in_steps (int): Interval in steps to save checkpoints. evaluation_interval_in_steps (int): Interval in steps to perform evaluation. train_data_loader (LLMDataLoader): Data loader with the training data. evaluation_data_loaders (list[LLMDataLoader]): List of data loaders with the evaluation data. checkpoint_saving (CheckpointSaving): Routine for saving checkpoints.
- Parameters:
app_state (AppState)
training_log_interval_in_steps (int)
checkpointing_interval_in_steps (int)
evaluation_interval_in_steps (int)
train_data_loader (LLMDataLoader)
evaluation_data_loaders (list[LLMDataLoader])
checkpoint_saving (CheckpointSaving)
modalities.loss_functions module
- class modalities.loss_functions.CLMCrossEntropyLoss(target_key, prediction_key, tag='CLMCrossEntropyLoss')[source]
Bases:
Loss
- class modalities.loss_functions.NCELoss(prediction_key1, prediction_key2, is_asymmetric=True, temperature=1.0, tag='NCELoss')[source]
Bases:
Loss
Noise Contrastive Estimation Loss
- Args:
prediction_key1 (str): key to access embedding 1. prediction_key2 (str): key to access embedding 2. is_asymmetric (bool, optional): specifies symmetric or asymmetric calculation of NCEloss. Defaults to True. temperature (float, optional): temperature. Defaults to 1.0. tag (str, optional): Defaults to “NCELoss”.
- modalities.loss_functions.nce_loss(embedding1, embedding2, device, is_asymmetric, temperature)[source]
This implementation calculates the noise contrastive estimation loss between embeddings of two different modalities Implementation slightly adapted from https://arxiv.org/pdf/1912.06430.pdf, https://github.com/antoine77340/MIL-NCE_HowTo100M changes include adding a temperature value and the choice of calculating asymmetric loss w.r.t. one modality This implementation is adapted to contrastive loss from CoCa model https://arxiv.org/pdf/2205.01917.pdf
- Return type:
- Parameters:
- Args:
embedding1 (torch.Tensor): embeddings from modality 1 of size batch_size x embed_dim. embedding2 (torch.Tensor): embeddings from modality 2 of size batch_size x embed_dim. device (torch.device): torch device for calculating loss. is_asymmetric (bool): boolean value to specify if the loss is calculated in one direction or both directions. temperature (float): temperature value for regulating loss.
- Returns:
torch.Tensor: loss tensor.
modalities.trainer module
- class modalities.trainer.ThroughputAggregationKeys(value)[source]
Bases:
Enum
- FORWARD_BACKWARD_TIME = 'FORWARD_BACKWARD_TIME'
- NUM_SAMPLES = 'NUM_SAMPLES'
- class modalities.trainer.Trainer(global_rank, progress_publisher, evaluation_result_publisher, gradient_acc_steps, global_num_tokens_per_train_step, num_seen_train_steps, global_num_seen_tokens, num_target_steps, num_target_tokens, gradient_clipper, mfu_calculator=None)[source]
Bases:
object
Initializes the Trainer object.
- Args:
global_rank (int): The global rank to which operates the trainer object. progress_publisher (MessagePublisher[ProgressUpdate]): The publisher for progress updates. evaluation_result_publisher (MessagePublisher[EvaluationResultBatch]):
The publisher for evaluation result batches.
gradient_acc_steps (int): The number of gradient accumulation steps. global_num_tokens_per_train_step (int): The number of global tokens per training step. num_seen_train_steps (int): The number of training steps already seen. global_num_seen_tokens (int): The number of tokens already seen. num_target_steps (int): The target number of training steps. num_target_tokens (int): The target number of tokens. gradient_clipper (GradientClipperIF): The gradient clipper. mfu_calculator (Optional[MFUCalculatorABC]): The MFU calculator.
- Returns:
None
- Parameters:
global_rank (int)
progress_publisher (MessagePublisher[ProgressUpdate])
evaluation_result_publisher (MessagePublisher[EvaluationResultBatch])
gradient_acc_steps (int)
global_num_tokens_per_train_step (int)
num_seen_train_steps (int)
global_num_seen_tokens (int)
num_target_steps (int)
num_target_tokens (int)
gradient_clipper (GradientClipperIF)
mfu_calculator (MFUCalculatorABC | None)
- train(app_state, train_loader, loss_fun, training_log_interval_in_steps, evaluation_callback, checkpointing_callback)[source]
Trains the model.
- Args:
app_state (AppState): The application state containing the model, optimizer and lr scheduler. train_loader (LLMDataLoader): The data loader containing the training data. loss_fun (Loss): The loss function used for training. training_log_interval_in_steps (int): The interval at which training progress is logged. evaluation_callback (Callable[[TrainingProgress], None]): A callback function for evaluation. checkpointing_callback (Callable[[TrainingProgress], None]): A callback function for checkpointing.
- Returns:
None
- Parameters:
app_state (AppState)
train_loader (LLMDataLoader)
loss_fun (Loss)
training_log_interval_in_steps (int)
evaluation_callback (Callable[[TrainingProgress], None])
checkpointing_callback (Callable[[TrainingProgress], None])
modalities.util module
- class modalities.util.Aggregator[source]
Bases:
Generic
[T
]
- class modalities.util.TimeRecorder[source]
Bases:
object
Class with context manager to record execution time
- class modalities.util.TimeRecorderStates(value)[source]
Bases:
Enum
- RUNNING = 'RUNNING'
- STOPPED = 'STOPPED'
- modalities.util.format_metrics_to_gb(item)[source]
quick function to format numbers to gigabyte and round to 4 digit precision
- modalities.util.get_experiment_id_of_run(config_file_path, hash_length=8, max_experiment_id_byte_length=1024)[source]
Create a unique experiment ID for the current run on rank 0 and broadcast it to all ranks. Internally, the experiment ID is generated by hashing the configuration file path and appending the current date and time. The experiment ID is then converted to a byte array (with maximum length of max_experiment_id_byte_length) and broadcasted to all ranks. In the unlikely case of the experiment ID being too long, a ValueError is raised and max_experment_id_byte_length must be increased. Each rank then decodes the byte array to the original string representation and returns it. Having a globally synced experiment ID is mandatory for saving files / checkpionts in a distributed training setup.
- Return type:
- Parameters:
- Args:
config_file_path (Path): Path to the configuration file. hash_length (Optional[int], optional): Defines the char length of the commit hash. Defaults to 8. max_experiment_id_byte_length (Optional[int]): Defines max byte length of the experiment_id
to be shared to other ranks. Defaults to 1024.
- Returns:
str: The experiment ID.
- modalities.util.get_local_number_of_trainable_parameters(model)[source]
Returns the number of trainable parameters that are materialized on the current rank. The model can be sharded with FSDP1 or FSDP2 or not sharded at all.
- Args:
model (nn.Module): The model for which to calculate the number of trainable parameters.
- Returns:
int: The number of trainable parameters materialized on the current rank.
- modalities.util.get_module_class_from_name(module, name)[source]
From Accelerate source code (https://github.com/huggingface/accelerate/blob/1f7a79b428749f45187ec69485f2c966fe21926e/src/accelerate/utils/dataclasses.py#L1902) Gets a class from a module by its name.
- Args:
module (torch.nn.Module): The module to get the class from. name (str): The name of the class.
- modalities.util.get_total_number_of_trainable_parameters(model)[source]
Returns the total number of trainable parameters across all ranks. The model must be sharded with FSDP1 or FSDP2.
- Return type:
- Parameters:
model (FullyShardedDataParallel | FSDPModule)
- Args:
model (FSDPX): The model for which to calculate the number of trainable parameters.
- Returns:
Number: The total number of trainable parameters across all ranks.