modalities.models package

Subpackages

Submodules

modalities.models.model module

class modalities.models.model.ActivationType(value)[source]

Bases: str, Enum

Enum class representing different activation types.

Attributes:

GELU (str): GELU activation type. SWIGLU (str): SWIGLU activation type.

GELU = 'gelu'
SWIGLU = 'swiglu'
class modalities.models.model.NNModel(seed=None, weight_decay_groups=None)[source]

Bases: Module

NNModel class to define a base model.

Initializes an NNModel object.

Args:

seed (int, optional): The seed value for random number generation. Defaults to None. weight_decay_groups (Optional[WeightDecayGroups], optional): The weight decay groups. Defaults to None.

Parameters:
abstractmethod forward(inputs)[source]

Forward pass of the model.

Return type:

dict[str, Tensor]

Parameters:

inputs (dict[str, Tensor])

Args:

inputs (dict[str, torch.Tensor]): A dictionary containing input tensors.

Returns:

dict[str, torch.Tensor]: A dictionary containing output tensors.

get_parameters()[source]

Returns a dictionary of the model’s parameters.

Return type:

dict[str, Tensor]

Returns:

A dictionary where the keys are the parameter names and the values are the corresponding parameter tensors.

property weight_decay_groups: dict[str, list[str]]

Returns the weight decay groups.

Returns:

WeightDecayGroups: The weight decay groups.

class modalities.models.model.SwiGLU(n_embd, ffn_hidden, bias, enforce_swiglu_hidden_dim_multiple_of=256)[source]

Bases: Module

SwiGLU class to define the SwiGLU activation function.

Initializes the SwiGLU object.

Args:

n_embd (int): The number of embedding dimensions. ffn_hidden (int): The number of hidden dimensions in the feed-forward network. Best practice: 4 * n_embd (https://arxiv.org/pdf/1706.03762) bias (bool): Whether to include bias terms in the linear layers. enforce_swiglu_hidden_dim_multiple_of (int): The multiple of which the hidden

dimension should be enforced. Defaults to 256. This is required for FSDP + TP as the combincation does not support uneven sharding (yet). Defaults to 256 if not provided.

Parameters:
  • n_embd (int)

  • ffn_hidden (int)

  • bias (bool)

  • enforce_swiglu_hidden_dim_multiple_of (int)

forward(x)[source]

Forward pass of the SwiGLU module.

Return type:

Tensor

Parameters:

x (Tensor)

Args:

x (torch.Tensor): Input tensor.

Returns:

torch.Tensor: Output tensor.

modalities.models.model.model_predict_batch(model, batch)[source]

Predicts the output for a batch of samples using the given model.

Return type:

InferenceResultBatch

Parameters:
Args:

model (nn.Module): The model used for prediction. batch (DatasetBatch): The batch of samples to be predicted.

Returns:

InferenceResultBatch: The batch of inference results containing the predicted targets and predictions.

modalities.models.model_factory module

modalities.models.utils module

Module contents