modalities package
Subpackages
- modalities.checkpointing package
- Subpackages
- Submodules
- modalities.checkpointing.checkpoint_conversion module
- modalities.checkpointing.checkpoint_loading module
- modalities.checkpointing.checkpoint_saving module
- modalities.checkpointing.checkpoint_saving_execution module
- modalities.checkpointing.checkpoint_saving_instruction module
- modalities.checkpointing.checkpoint_saving_strategies module
- Module contents
- modalities.config package
- modalities.conversion package
- Subpackages
- modalities.conversion.gpt2 package
- Submodules
- modalities.conversion.gpt2.configuration_gpt2 module
- modalities.conversion.gpt2.conversion_code module
- modalities.conversion.gpt2.conversion_model module
- modalities.conversion.gpt2.conversion_tokenizer module
- modalities.conversion.gpt2.convert_gpt2 module
- modalities.conversion.gpt2.modeling_gpt2 module
- Module contents
- modalities.conversion.gpt2 package
- Module contents
- Subpackages
- modalities.dataloader package
- Subpackages
- Submodules
- modalities.dataloader.apply_chat_template module
- modalities.dataloader.create_index module
- modalities.dataloader.create_instruction_tuning_data module
- modalities.dataloader.create_packed_data module
- modalities.dataloader.dataloader module
- modalities.dataloader.dataloader_factory module
- modalities.dataloader.dataset module
CombinedDatasetDatasetDummyDatasetDummyDatasetConfigDummySampleConfigDummySampleDataTypeMemMapDatasetPackedMemMapDatasetBasePackedMemMapDatasetBase.DATA_SECTION_LENGTH_IN_BYTESPackedMemMapDatasetBase.HEADER_SIZE_IN_BYTESPackedMemMapDatasetBase.TOKEN_SIZE_DESCRIPTOR_LENGTH_IN_BYTESPackedMemMapDatasetBase.np_dtype_of_tokens_on_disk_from_bytesPackedMemMapDatasetBase.token_size_in_bytesPackedMemMapDatasetBase.type_converter_for_torch
PackedMemMapDatasetContinuousPackedMemMapDatasetMegatron
- modalities.dataloader.dataset_factory module
- modalities.dataloader.filter_packed_data module
- modalities.dataloader.large_file_lines_reader module
- modalities.dataloader.sampler_factory module
- modalities.dataloader.samplers module
- Module contents
- modalities.inference package
- modalities.logging_broker package
- modalities.models package
- Subpackages
- Submodules
- modalities.models.model module
- modalities.models.model_factory module
- modalities.models.utils module
- Module contents
- modalities.nn package
- modalities.optimizers package
- modalities.preprocessing package
- modalities.registry package
- modalities.running_env package
- modalities.tokenization package
- modalities.training package
- Subpackages
- Submodules
- modalities.training.training_progress module
TrainingProgressTrainingProgress.num_seen_steps_current_runTrainingProgress.num_seen_steps_previous_runTrainingProgress.num_seen_steps_totalTrainingProgress.num_seen_tokens_current_runTrainingProgress.num_seen_tokens_previous_runTrainingProgress.num_seen_tokens_totalTrainingProgress.num_target_stepsTrainingProgress.num_target_tokens
- Module contents
- modalities.utils package
- Subpackages
- modalities.utils.benchmarking package
- modalities.utils.profilers package
- Submodules
- modalities.utils.profilers.batch_generator module
- modalities.utils.profilers.modalities_profiler module
- modalities.utils.profilers.profiler_configs module
- modalities.utils.profilers.profiler_factory module
- modalities.utils.profilers.profilers module
- modalities.utils.profilers.steppable_component_configs module
- modalities.utils.profilers.steppable_components module
- modalities.utils.profilers.steppable_components_if module
- Module contents
- Submodules
- modalities.utils.communication_test module
- modalities.utils.debug module
- modalities.utils.debug_components module
- modalities.utils.debugging_configs module
- modalities.utils.deprecated_alias module
- modalities.utils.file_ops module
- modalities.utils.logger_utils module
- modalities.utils.maybe_list_parameter module
- modalities.utils.mfu module
- modalities.utils.number_conversion module
LocalNumBatchesFromNumSamplesConfigLocalNumBatchesFromNumTokensConfigNumSamplesFromNumTokensConfigNumStepsFromNumSamplesConfigNumStepsFromNumTokensConfigNumStepsFromRawDatasetIndexConfigNumTokensFromNumStepsConfigNumTokensFromPackedMemMapDatasetContinuousConfigNumTokensFromPackedMemMapDatasetContinuousConfig.dataset_pathNumTokensFromPackedMemMapDatasetContinuousConfig.dp_degreeNumTokensFromPackedMemMapDatasetContinuousConfig.gradient_accumulation_stepsNumTokensFromPackedMemMapDatasetContinuousConfig.local_micro_batch_sizeNumTokensFromPackedMemMapDatasetContinuousConfig.model_configNumTokensFromPackedMemMapDatasetContinuousConfig.reuse_last_targetNumTokensFromPackedMemMapDatasetContinuousConfig.sample_keyNumTokensFromPackedMemMapDatasetContinuousConfig.sequence_length
NumberConversionNumberConversion.get_global_num_seen_tokens_from_checkpoint_path()NumberConversion.get_global_num_target_tokens_from_checkpoint_path()NumberConversion.get_last_step_from_checkpoint_path()NumberConversion.get_local_num_batches_from_num_samples()NumberConversion.get_local_num_batches_from_num_tokens()NumberConversion.get_num_samples_from_num_tokens()NumberConversion.get_num_seen_steps_from_checkpoint_path()NumberConversion.get_num_steps_from_num_samples()NumberConversion.get_num_steps_from_num_tokens()NumberConversion.get_num_steps_from_raw_dataset_index()NumberConversion.get_num_target_steps_from_checkpoint_path()NumberConversion.get_num_tokens_from_num_steps()NumberConversion.get_num_tokens_from_packed_mem_map_dataset_continuous()
NumberConversionFromCheckpointPathConfig
- modalities.utils.seeding module
- modalities.utils.typing_utils module
- modalities.utils.verify_tokenization_consistency module
- Module contents
- Subpackages
Submodules
modalities.api module
modalities.batch module
- class modalities.batch.Batch[source]
Bases:
ABCAbstract class that defines the necessary methods any Batch implementation needs to implement.
- class modalities.batch.DatasetBatch(samples, targets, batch_dim=0)[source]
Bases:
Batch,TorchDeviceMixinA batch of samples and its targets. Used to batch train a model.
- class modalities.batch.EvaluationResultBatch(dataloader_tag, num_train_steps_done, losses=<factory>, metrics=<factory>, throughput_metrics=<factory>)[source]
Bases:
BatchData class for storing the results of a single or multiple batches. Also entire epoch results are stored in here.
- Parameters:
dataloader_tag (str)
num_train_steps_done (int)
losses (dict[str, ResultItem])
metrics (dict[str, ResultItem])
throughput_metrics (dict[str, ResultItem])
- losses: dict[str, ResultItem] = <dataclasses._MISSING_TYPE object>
- metrics: dict[str, ResultItem] = <dataclasses._MISSING_TYPE object>
- throughput_metrics: dict[str, ResultItem] = <dataclasses._MISSING_TYPE object>
- class modalities.batch.InferenceResultBatch(targets, predictions, batch_dim=0)[source]
Bases:
Batch,TorchDeviceMixinStores targets and predictions of an entire batch.
modalities.evaluator module
modalities.exceptions module
modalities.gym module
modalities.loss_functions module
- class modalities.loss_functions.CLMCrossEntropyLoss(target_key, prediction_key, tag='CLMCrossEntropyLoss')[source]
Bases:
Loss
- class modalities.loss_functions.NCELoss(prediction_key1, prediction_key2, is_asymmetric=True, temperature=1.0, tag='NCELoss')[source]
Bases:
LossNoise Contrastive Estimation Loss
- Args:
prediction_key1 (str): key to access embedding 1. prediction_key2 (str): key to access embedding 2. is_asymmetric (bool, optional): specifies symmetric or asymmetric calculation of NCEloss. Defaults to True. temperature (float, optional): temperature. Defaults to 1.0. tag (str, optional): Defaults to “NCELoss”.
- modalities.loss_functions.nce_loss(embedding1, embedding2, device, is_asymmetric, temperature)[source]
This implementation calculates the noise contrastive estimation loss between embeddings of two different modalities Implementation slightly adapted from https://arxiv.org/pdf/1912.06430.pdf, https://github.com/antoine77340/MIL-NCE_HowTo100M changes include adding a temperature value and the choice of calculating asymmetric loss w.r.t. one modality This implementation is adapted to contrastive loss from CoCa model https://arxiv.org/pdf/2205.01917.pdf
- Return type:
- Parameters:
- Args:
embedding1 (torch.Tensor): embeddings from modality 1 of size batch_size x embed_dim. embedding2 (torch.Tensor): embeddings from modality 2 of size batch_size x embed_dim. device (torch.device): torch device for calculating loss. is_asymmetric (bool): boolean value to specify if the loss is calculated in one direction or both directions. temperature (float): temperature value for regulating loss.
- Returns:
torch.Tensor: loss tensor.
modalities.main module
modalities.trainer module
modalities.util module
- class modalities.util.TimeRecorder[source]
Bases:
objectClass with context manager to record execution time
- class modalities.util.TimeRecorderStates(value)[source]
Bases:
Enum- RUNNING = 'RUNNING'
- STOPPED = 'STOPPED'
- modalities.util.format_metrics_to_gb(item)[source]
quick function to format numbers to gigabyte and round to 4 digit precision
- modalities.util.get_experiment_id_from_config(config_file_path, hash_length=16)[source]
Create experiment ID including the date and time for file save uniqueness example: 2022-05-07__14-31-22_fdh1xaj2’
- modalities.util.get_local_number_of_trainable_parameters(model)[source]
Returns the number of trainable parameters that are materialized on the current rank. The model can be sharded with FSDP1 or FSDP2 or not sharded at all.
- Args:
model (nn.Module): The model for which to calculate the number of trainable parameters.
- Returns:
int: The number of trainable parameters materialized on the current rank.
- modalities.util.get_module_class_from_name(module, name)[source]
From Accelerate source code (https://github.com/huggingface/accelerate/blob/1f7a79b428749f45187ec69485f2c966fe21926e/src/accelerate/utils/dataclasses.py#L1902) Gets a class from a module by its name.
- Args:
module (torch.nn.Module): The module to get the class from. name (str): The name of the class.
- modalities.util.get_synced_experiment_id_of_run(config_file_path=None, hash_length=16, max_experiment_id_byte_length=1024)[source]
Create a unique experiment ID for the current run on rank 0 and broadcast it to all ranks. Internally, the experiment ID is generated by hashing the configuration file path and appending the current date and time. The experiment ID is then converted to a byte array (with maximum length of max_experiment_id_byte_length) and broadcasted to all ranks. In the unlikely case of the experiment ID being too long, a ValueError is raised and max_experiment_id_byte_length must be increased. Each rank then decodes the byte array to the original string representation and returns it. Having a globally synced experiment ID is mandatory for saving files / checkpionts in a distributed training setup.
- Return type:
- Parameters:
- Args:
config_file_path (Path): Path to the configuration file. hash_length (Optional[int], optional): Defines the char length of the commit hash. Defaults to 16. max_experiment_id_byte_length (Optional[int]): Defines max byte length of the experiment_id
to be shared to other ranks. Defaults to 1024.
- Returns:
str: The experiment ID.
- modalities.util.get_synced_string(string_to_be_synced, from_rank=0, max_string_byte_length=1024)[source]
Broadcast a string from one rank to all other ranks in the distributed setup.
- Args:
string_to_be_synced (str): The string to be synced across ranks. from_rank (int, optional): The rank that generates the string. Defaults to 0. max_string_byte_length (Optional[int], optional): Maximum byte length of the string to be synced.
Defaults to 1024.
- Returns:
str: The synced string, decoded from the byte array.
- Raises:
ValueError: If the string exceeds the maximum byte length.
- modalities.util.get_total_number_of_trainable_parameters(model, device_mesh)[source]
Returns the total number of trainable parameters across all ranks. The model must be sharded with FSDP1 or FSDP2.
- Return type:
- Parameters:
model (FullyShardedDataParallel | FSDPModule)
device_mesh (DeviceMesh | None)
- Args:
model (FSDPX): The model for which to calculate the number of trainable parameters. device_mesh (DeviceMesh | None): The device mesh used for distributed training.
- Returns:
Number: The total number of trainable parameters across all ranks.