modalities package

Subpackages

Submodules

modalities.api module

modalities.batch module

class modalities.batch.Batch[source]

Bases: ABC

Abstract class that defines the necessary methods any Batch implementation needs to implement.

class modalities.batch.DatasetBatch(samples, targets, batch_dim=0)[source]

Bases: Batch, TorchDeviceMixin

A batch of samples and its targets. Used to batch train a model.

Parameters:
batch_dim: int = 0
detach()[source]
property device: device
samples: dict[str, Tensor] = <dataclasses._MISSING_TYPE object>
targets: dict[str, Tensor] = <dataclasses._MISSING_TYPE object>
to(device)[source]
Parameters:

device (device)

class modalities.batch.EvaluationResultBatch(dataloader_tag, num_train_steps_done, losses=<factory>, metrics=<factory>, throughput_metrics=<factory>)[source]

Bases: Batch

Data class for storing the results of a single or multiple batches. Also entire epoch results are stored in here.

Parameters:
dataloader_tag: str = <dataclasses._MISSING_TYPE object>
losses: dict[str, ResultItem] = <dataclasses._MISSING_TYPE object>
metrics: dict[str, ResultItem] = <dataclasses._MISSING_TYPE object>
num_train_steps_done: int = <dataclasses._MISSING_TYPE object>
throughput_metrics: dict[str, ResultItem] = <dataclasses._MISSING_TYPE object>
class modalities.batch.InferenceResultBatch(targets, predictions, batch_dim=0)[source]

Bases: Batch, TorchDeviceMixin

Stores targets and predictions of an entire batch.

Parameters:
batch_dim: int = 0
detach()[source]
property device: device
get_predictions(key)[source]
Return type:

Tensor

Parameters:

key (str)

get_targets(key)[source]
Return type:

Tensor

Parameters:

key (str)

predictions: dict[str, Tensor] = <dataclasses._MISSING_TYPE object>
targets: dict[str, Tensor] = <dataclasses._MISSING_TYPE object>
to(device)[source]
Parameters:

device (device)

to_cpu()[source]
class modalities.batch.ResultItem(value, decimal_places=None)[source]

Bases: object

Parameters:
decimal_places: int | None = None
value: Tensor = <dataclasses._MISSING_TYPE object>
class modalities.batch.TorchDeviceMixin[source]

Bases: ABC

abstractmethod detach()[source]
abstract property device: device
abstractmethod to(device)[source]
Parameters:

device (device)

modalities.evaluator module

modalities.exceptions module

exception modalities.exceptions.BatchStateError[source]

Bases: Exception

exception modalities.exceptions.CheckpointingError[source]

Bases: Exception

exception modalities.exceptions.ConfigError[source]

Bases: Exception

exception modalities.exceptions.DatasetNotFoundError[source]

Bases: Exception

exception modalities.exceptions.ModelStateError[source]

Bases: Exception

exception modalities.exceptions.OptimizerError[source]

Bases: Exception

exception modalities.exceptions.RunningEnvError[source]

Bases: Exception

exception modalities.exceptions.TimeRecorderStateError[source]

Bases: Exception

modalities.gym module

modalities.loss_functions module

class modalities.loss_functions.CLMCrossEntropyLoss(target_key, prediction_key, tag='CLMCrossEntropyLoss')[source]

Bases: Loss

Parameters:
  • target_key (str)

  • prediction_key (str)

  • tag (str)

class modalities.loss_functions.Loss(tag)[source]

Bases: ABC

Parameters:

tag (str)

property tag: str
class modalities.loss_functions.NCELoss(prediction_key1, prediction_key2, is_asymmetric=True, temperature=1.0, tag='NCELoss')[source]

Bases: Loss

Noise Contrastive Estimation Loss

Args:

prediction_key1 (str): key to access embedding 1. prediction_key2 (str): key to access embedding 2. is_asymmetric (bool, optional): specifies symmetric or asymmetric calculation of NCEloss. Defaults to True. temperature (float, optional): temperature. Defaults to 1.0. tag (str, optional): Defaults to “NCELoss”.

Parameters:
  • prediction_key1 (str)

  • prediction_key2 (str)

  • is_asymmetric (bool)

  • temperature (float)

  • tag (str)

modalities.loss_functions.nce_loss(embedding1, embedding2, device, is_asymmetric, temperature)[source]

This implementation calculates the noise contrastive estimation loss between embeddings of two different modalities Implementation slightly adapted from https://arxiv.org/pdf/1912.06430.pdf, https://github.com/antoine77340/MIL-NCE_HowTo100M changes include adding a temperature value and the choice of calculating asymmetric loss w.r.t. one modality This implementation is adapted to contrastive loss from CoCa model https://arxiv.org/pdf/2205.01917.pdf

Return type:

Tensor

Parameters:
Args:

embedding1 (torch.Tensor): embeddings from modality 1 of size batch_size x embed_dim. embedding2 (torch.Tensor): embeddings from modality 2 of size batch_size x embed_dim. device (torch.device): torch device for calculating loss. is_asymmetric (bool): boolean value to specify if the loss is calculated in one direction or both directions. temperature (float): temperature value for regulating loss.

Returns:

torch.Tensor: loss tensor.

modalities.main module

modalities.trainer module

modalities.util module

class modalities.util.TimeRecorder[source]

Bases: object

Class with context manager to record execution time

reset()[source]
start()[source]
stop()[source]
class modalities.util.TimeRecorderStates(value)[source]

Bases: Enum

RUNNING = 'RUNNING'
STOPPED = 'STOPPED'
modalities.util.cpu_scalar_float(x)[source]
Return type:

float

Parameters:

x (Tensor | float)

modalities.util.cpu_scalar_int(x)[source]
Return type:

int

Parameters:

x (Tensor | int)

modalities.util.format_metrics_to_gb(item)[source]

quick function to format numbers to gigabyte and round to 4 digit precision

Return type:

float

Parameters:

item (int)

modalities.util.get_experiment_id_from_config(config_file_path, hash_length=16)[source]

Create experiment ID including the date and time for file save uniqueness example: 2022-05-07__14-31-22_fdh1xaj2’

Return type:

str

Parameters:
  • config_file_path (Path | None)

  • hash_length (int | None)

modalities.util.get_local_number_of_trainable_parameters(model)[source]

Returns the number of trainable parameters that are materialized on the current rank. The model can be sharded with FSDP1 or FSDP2 or not sharded at all.

Return type:

int

Parameters:

model (Module)

Args:

model (nn.Module): The model for which to calculate the number of trainable parameters.

Returns:

int: The number of trainable parameters materialized on the current rank.

modalities.util.get_module_class_from_name(module, name)[source]

From Accelerate source code (https://github.com/huggingface/accelerate/blob/1f7a79b428749f45187ec69485f2c966fe21926e/src/accelerate/utils/dataclasses.py#L1902) Gets a class from a module by its name.

Return type:

Optional[Type[Module]]

Parameters:
Args:

module (torch.nn.Module): The module to get the class from. name (str): The name of the class.

modalities.util.get_synced_experiment_id_of_run(config_file_path=None, hash_length=16, max_experiment_id_byte_length=1024)[source]

Create a unique experiment ID for the current run on rank 0 and broadcast it to all ranks. Internally, the experiment ID is generated by hashing the configuration file path and appending the current date and time. The experiment ID is then converted to a byte array (with maximum length of max_experiment_id_byte_length) and broadcasted to all ranks. In the unlikely case of the experiment ID being too long, a ValueError is raised and max_experiment_id_byte_length must be increased. Each rank then decodes the byte array to the original string representation and returns it. Having a globally synced experiment ID is mandatory for saving files / checkpionts in a distributed training setup.

Return type:

str

Parameters:
  • config_file_path (Path | None)

  • hash_length (int | None)

  • max_experiment_id_byte_length (int)

Args:

config_file_path (Path): Path to the configuration file. hash_length (Optional[int], optional): Defines the char length of the commit hash. Defaults to 16. max_experiment_id_byte_length (Optional[int]): Defines max byte length of the experiment_id

to be shared to other ranks. Defaults to 1024.

Returns:

str: The experiment ID.

modalities.util.get_synced_string(string_to_be_synced, from_rank=0, max_string_byte_length=1024)[source]

Broadcast a string from one rank to all other ranks in the distributed setup.

Return type:

str

Parameters:
  • string_to_be_synced (str)

  • from_rank (int)

  • max_string_byte_length (int)

Args:

string_to_be_synced (str): The string to be synced across ranks. from_rank (int, optional): The rank that generates the string. Defaults to 0. max_string_byte_length (Optional[int], optional): Maximum byte length of the string to be synced.

Defaults to 1024.

Returns:

str: The synced string, decoded from the byte array.

Raises:

ValueError: If the string exceeds the maximum byte length.

modalities.util.get_total_number_of_trainable_parameters(model, device_mesh)[source]

Returns the total number of trainable parameters across all ranks. The model must be sharded with FSDP1 or FSDP2.

Return type:

Union[int, float, bool]

Parameters:
Args:

model (FSDPX): The model for which to calculate the number of trainable parameters. device_mesh (DeviceMesh | None): The device mesh used for distributed training.

Returns:

Number: The total number of trainable parameters across all ranks.

modalities.util.parse_enum_by_name(name, enum_type)[source]
Return type:

TypeVar(TEnum, bound= Enum)

Parameters:
  • name (str)

  • enum_type (Type[TEnum])

modalities.util.print_rank_0(message)[source]

If torch.distributed is initialized, print only on rank 0.

Parameters:

message (str)

modalities.util.warn_rank_0(message)[source]

If torch.distributed is initialized, print only on rank 0.

Parameters:

message (str)

Module contents