Welcome to Modalities’ documentation!
We propose a novel training framework for Multimodal Large Language Models (LLMs) that prioritizes code readability and efficiency. The codebase adheres to the principles of “clean code,” minimizing Lines of Code (LoC) while maintaining extensibility. A single, comprehensive configuration file enables easy customization of various model and training parameters.
A key innovation is the adoption of a PyTorch-native training loop integrated with the Fully Sharded Data Parallelism (FSDP) technique. FSDP optimizes memory usage and training speed, enhancing scalability for large-scale multimodal models. By leveraging PyTorch’s native capabilities, our framework simplifies the development process and promotes ease of maintenance.
The framework’s modular design facilitates experimentation with different multimodal architectures and training strategies. Users can seamlessly integrate diverse datasets and model components, allowing for comprehensive exploration of multimodal learning tasks. The combination of clean code, minimal configuration, and PyTorch-native training with FSDP contributes to a user-friendly and efficient platform for developing state-of-the-art multimodal language models.
Note
This project is under active development.
Getting Started
Entrypoints
VSCode Setup
Future Work
API
- modalities- modalities package- Subpackages- modalities.checkpointing package- Subpackages- modalities.checkpointing.fsdp package
- modalities.checkpointing.stateful package
- modalities.checkpointing.torch package
 
- Submodules
- modalities.checkpointing.checkpoint_conversion module
- modalities.checkpointing.checkpoint_loading module
- modalities.checkpointing.checkpoint_saving module
- modalities.checkpointing.checkpoint_saving_execution module
- modalities.checkpointing.checkpoint_saving_instruction module
- modalities.checkpointing.checkpoint_saving_strategies module
- Module contents
 
- Subpackages
- modalities.config package- Submodules
- modalities.config.component_factory module
- modalities.config.config module- ActivationCheckpointedModelConfig- ActivationCheckpointedModelConfig.FullACParams
- ActivationCheckpointedModelConfig.SelectiveLayerACParams
- ActivationCheckpointedModelConfig.SelectiveOpACParams
- ActivationCheckpointedModelConfig.ac_fun_params
- ActivationCheckpointedModelConfig.ac_variant
- ActivationCheckpointedModelConfig.layers_fqn
- ActivationCheckpointedModelConfig.model
- ActivationCheckpointedModelConfig.model_config
 
- AdamOptimizerConfig
- AdamWOptimizerConfig
- BatchSamplerConfig
- CLMCrossEntropyLossConfig
- CheckpointSavingConfig
- CombinedDatasetConfig
- CompiledModelConfig
- ConstantLRSchedulerConfig
- CosineAnnealingLRSchedulerConfig
- DCPAppStateConfig
- DCPCheckpointLoadingConfig
- DCPCheckpointSavingConfig
- DebuggingEnrichedModelConfig
- DistributedSamplerConfig
- DummyLRSchedulerConfig
- DummyProgressSubscriberConfig
- DummyResultSubscriberConfig
- EvaluationResultToDiscSubscriberConfig
- FSDP1ActivationCheckpointedModelConfig
- FSDP1CheckpointLoadingConfig- FSDP1CheckpointLoadingConfig.block_names
- FSDP1CheckpointLoadingConfig.global_rank
- FSDP1CheckpointLoadingConfig.mixed_precision_settings
- FSDP1CheckpointLoadingConfig.model_config
- FSDP1CheckpointLoadingConfig.parse_mixed_precision_setting_by_name()
- FSDP1CheckpointLoadingConfig.parse_sharding_strategy_by_name()
- FSDP1CheckpointLoadingConfig.sharding_strategy
 
- FSDP1CheckpointSavingConfig
- FSDP1CheckpointedModelConfig
- FSDP1CheckpointedOptimizerConfig
- FSDP2WrappedModelConfig- FSDP2WrappedModelConfig.block_names
- FSDP2WrappedModelConfig.device_mesh
- FSDP2WrappedModelConfig.mixed_precision_settings
- FSDP2WrappedModelConfig.model
- FSDP2WrappedModelConfig.model_config
- FSDP2WrappedModelConfig.reshard_after_forward
- FSDP2WrappedModelConfig.validate_dp_mesh_existence()
- FSDP2WrappedModelConfig.validate_mixed_precision_settings()
 
- FSDPWrappedModelConfig- FSDPWrappedModelConfig.block_names
- FSDPWrappedModelConfig.mixed_precision_settings
- FSDPWrappedModelConfig.model
- FSDPWrappedModelConfig.model_config
- FSDPWrappedModelConfig.parse_mixed_precision_setting_by_name()
- FSDPWrappedModelConfig.parse_sharding_strategy_by_name()
- FSDPWrappedModelConfig.sharding_strategy
- FSDPWrappedModelConfig.sync_module_states
 
- GPT2LLMCollateFnConfig
- GPT2MFUCalculatorConfig
- GPT2ModelTPConfig
- LLMDataLoaderConfig
- LinearLRSchedulerConfig
- MemMapDatasetConfig
- OneCycleLRSchedulerConfig- OneCycleLRSchedulerConfig.anneal_strategy
- OneCycleLRSchedulerConfig.base_momentum
- OneCycleLRSchedulerConfig.check_totals_steps_and_epchs()
- OneCycleLRSchedulerConfig.cycle_momentum
- OneCycleLRSchedulerConfig.div_factor
- OneCycleLRSchedulerConfig.epochs
- OneCycleLRSchedulerConfig.final_div_factor
- OneCycleLRSchedulerConfig.last_epoch
- OneCycleLRSchedulerConfig.max_lr
- OneCycleLRSchedulerConfig.max_momentum
- OneCycleLRSchedulerConfig.model_config
- OneCycleLRSchedulerConfig.optimizer
- OneCycleLRSchedulerConfig.pct_start
- OneCycleLRSchedulerConfig.steps_per_epoch
- OneCycleLRSchedulerConfig.three_phase
- OneCycleLRSchedulerConfig.total_steps
 
- PackedMemMapDatasetContinuousConfig
- PackedMemMapDatasetMegatronConfig
- ParallelDegreeConfig
- PassType
- PreTrainedHFTokenizerConfig
- PreTrainedSPTokenizerConfig
- PrecisionEnum
- ProcessGroupBackendType
- RawAppStateConfig
- ReferenceConfig
- ResumableDistributedSamplerConfig- ResumableDistributedSamplerConfig.dataset
- ResumableDistributedSamplerConfig.drop_last
- ResumableDistributedSamplerConfig.epoch
- ResumableDistributedSamplerConfig.model_config
- ResumableDistributedSamplerConfig.num_replicas
- ResumableDistributedSamplerConfig.rank
- ResumableDistributedSamplerConfig.seed
- ResumableDistributedSamplerConfig.shuffle
- ResumableDistributedSamplerConfig.skip_num_global_samples
 
- RichProgressSubscriberConfig
- RichResultSubscriberConfig
- SaveEveryKStepsCheckpointingStrategyConfig
- SaveKMostRecentCheckpointsStrategyConfig
- SequentialSamplerConfig
- StepLRSchedulerConfig
- TokenizerTypes
- TorchCheckpointLoadingConfig
- WandBEvaluationResultSubscriberConfig- WandBEvaluationResultSubscriberConfig.config_file_path
- WandBEvaluationResultSubscriberConfig.directory
- WandBEvaluationResultSubscriberConfig.experiment_id
- WandBEvaluationResultSubscriberConfig.global_rank
- WandBEvaluationResultSubscriberConfig.mode
- WandBEvaluationResultSubscriberConfig.model_config
- WandBEvaluationResultSubscriberConfig.project
 
- WandbMode
- WeightInitializedModelConfig
- load_app_config_dict()
 
- modalities.config.instantiation_models module- ConsistencyEnforcement
- CudaEnvSettings
- InstructionTuningDataInstantiationModel- InstructionTuningDataInstantiationModel.InstructionDataTransformation
- InstructionTuningDataInstantiationModel.Settings- InstructionTuningDataInstantiationModel.Settings.dst_path
- InstructionTuningDataInstantiationModel.Settings.messages_key
- InstructionTuningDataInstantiationModel.Settings.model_config
- InstructionTuningDataInstantiationModel.Settings.pbin_creation_config_file_path
- InstructionTuningDataInstantiationModel.Settings.split_config
- InstructionTuningDataInstantiationModel.Settings.src_path
 
- InstructionTuningDataInstantiationModel.chat_template_data
- InstructionTuningDataInstantiationModel.instruction_data_transformation
- InstructionTuningDataInstantiationModel.jinja2_chat_template
- InstructionTuningDataInstantiationModel.model_config
- InstructionTuningDataInstantiationModel.settings
 
- Intervals
- PackedDatasetComponentsInstantiationModel- PackedDatasetComponentsInstantiationModel.PackedDatasetSettings- PackedDatasetComponentsInstantiationModel.PackedDatasetSettings.dst_path
- PackedDatasetComponentsInstantiationModel.PackedDatasetSettings.eod_token
- PackedDatasetComponentsInstantiationModel.PackedDatasetSettings.index_path
- PackedDatasetComponentsInstantiationModel.PackedDatasetSettings.jq_pattern
- PackedDatasetComponentsInstantiationModel.PackedDatasetSettings.model_config
- PackedDatasetComponentsInstantiationModel.PackedDatasetSettings.num_cpus
- PackedDatasetComponentsInstantiationModel.PackedDatasetSettings.processed_samples_queue_size
- PackedDatasetComponentsInstantiationModel.PackedDatasetSettings.processing_batch_size
- PackedDatasetComponentsInstantiationModel.PackedDatasetSettings.raw_samples_queue_size
- PackedDatasetComponentsInstantiationModel.PackedDatasetSettings.src_path
 
- PackedDatasetComponentsInstantiationModel.model_config
- PackedDatasetComponentsInstantiationModel.settings
- PackedDatasetComponentsInstantiationModel.tokenizer
 
- SplitConfig
- Splitting
- StepProfile
- TextGenerationInstantiationModel- TextGenerationInstantiationModel.TextGenerationSettings- TextGenerationInstantiationModel.TextGenerationSettings.device
- TextGenerationInstantiationModel.TextGenerationSettings.model_config
- TextGenerationInstantiationModel.TextGenerationSettings.model_path
- TextGenerationInstantiationModel.TextGenerationSettings.parse_device()
- TextGenerationInstantiationModel.TextGenerationSettings.referencing_keys
- TextGenerationInstantiationModel.TextGenerationSettings.sequence_length
 
- TextGenerationInstantiationModel.model_config
- TextGenerationInstantiationModel.settings
- TextGenerationInstantiationModel.text_inference_component
 
- TrainingComponentsInstantiationModel- TrainingComponentsInstantiationModel.Settings- TrainingComponentsInstantiationModel.Settings.DCPWarmstartCheckpointPaths
- TrainingComponentsInstantiationModel.Settings.Paths
- TrainingComponentsInstantiationModel.Settings.WarmstartCheckpointPaths
- TrainingComponentsInstantiationModel.Settings.config_file_path
- TrainingComponentsInstantiationModel.Settings.consistency_enforcement
- TrainingComponentsInstantiationModel.Settings.cuda_env
- TrainingComponentsInstantiationModel.Settings.experiment_id
- TrainingComponentsInstantiationModel.Settings.intervals
- TrainingComponentsInstantiationModel.Settings.model_config
- TrainingComponentsInstantiationModel.Settings.paths
- TrainingComponentsInstantiationModel.Settings.referencing_keys
- TrainingComponentsInstantiationModel.Settings.step_profile
- TrainingComponentsInstantiationModel.Settings.training_progress
- TrainingComponentsInstantiationModel.Settings.training_target
- TrainingComponentsInstantiationModel.Settings.warmstart_checkpoint_paths
 
- TrainingComponentsInstantiationModel.app_state
- TrainingComponentsInstantiationModel.checkpoint_saving
- TrainingComponentsInstantiationModel.eval_dataloaders
- TrainingComponentsInstantiationModel.evaluation_subscriber
- TrainingComponentsInstantiationModel.gradient_clipper
- TrainingComponentsInstantiationModel.loss_fn
- TrainingComponentsInstantiationModel.mfu_calculator
- TrainingComponentsInstantiationModel.model_config
- TrainingComponentsInstantiationModel.model_raw
- TrainingComponentsInstantiationModel.progress_subscriber
- TrainingComponentsInstantiationModel.settings
- TrainingComponentsInstantiationModel.train_dataloader
- TrainingComponentsInstantiationModel.train_dataset
 
- TrainingProgress
- TrainingReportGenerator
- TrainingTarget
 
- modalities.config.lookup_enum module
- modalities.config.pydantic_if_types module
- modalities.config.utils module
- Module contents
 
- modalities.conversion package- Subpackages- modalities.conversion.gpt2 package- Submodules
- modalities.conversion.gpt2.configuration_gpt2 module
- modalities.conversion.gpt2.conversion_code module
- modalities.conversion.gpt2.conversion_model module
- modalities.conversion.gpt2.conversion_tokenizer module
- modalities.conversion.gpt2.convert_gpt2 module
- modalities.conversion.gpt2.modeling_gpt2 module
- Module contents
 
 
- modalities.conversion.gpt2 package
- Module contents
 
- Subpackages
- modalities.dataloader package- Subpackages
- Submodules
- modalities.dataloader.apply_chat_template module
- modalities.dataloader.create_index module
- modalities.dataloader.create_instruction_tuning_data module
- modalities.dataloader.create_packed_data module
- modalities.dataloader.dataloader module
- modalities.dataloader.dataloader_factory module
- modalities.dataloader.dataset module- CombinedDataset
- Dataset
- DummyDataset
- DummyDatasetConfig
- DummySampleConfig
- DummySampleDataType
- MemMapDataset
- PackedMemMapDatasetBase- PackedMemMapDatasetBase.DATA_SECTION_LENGTH_IN_BYTES
- PackedMemMapDatasetBase.HEADER_SIZE_IN_BYTES
- PackedMemMapDatasetBase.TOKEN_SIZE_DESCRIPTOR_LENGTH_IN_BYTES
- PackedMemMapDatasetBase.np_dtype_of_tokens_on_disk_from_bytes
- PackedMemMapDatasetBase.token_size_in_bytes
- PackedMemMapDatasetBase.type_converter_for_torch
 
- PackedMemMapDatasetContinuous
- PackedMemMapDatasetMegatron
 
- modalities.dataloader.dataset_factory module
- modalities.dataloader.filter_packed_data module
- modalities.dataloader.large_file_lines_reader module
- modalities.dataloader.sampler_factory module- ResumableDistributedMultiDimSamplerConfig- ResumableDistributedMultiDimSamplerConfig.data_parallel_key
- ResumableDistributedMultiDimSamplerConfig.dataset
- ResumableDistributedMultiDimSamplerConfig.device_mesh
- ResumableDistributedMultiDimSamplerConfig.drop_last
- ResumableDistributedMultiDimSamplerConfig.epoch
- ResumableDistributedMultiDimSamplerConfig.model_config
- ResumableDistributedMultiDimSamplerConfig.seed
- ResumableDistributedMultiDimSamplerConfig.shuffle
- ResumableDistributedMultiDimSamplerConfig.skip_num_global_samples
 
- SamplerFactory
 
- modalities.dataloader.samplers module
- Module contents
 
- modalities.inference package- Subpackages- modalities.inference.text package- Submodules
- modalities.inference.text.config module- TextInferenceComponentConfig- TextInferenceComponentConfig.device
- TextInferenceComponentConfig.eod_token
- TextInferenceComponentConfig.model
- TextInferenceComponentConfig.model_config
- TextInferenceComponentConfig.parse_device()
- TextInferenceComponentConfig.prompt_template
- TextInferenceComponentConfig.sequence_length
- TextInferenceComponentConfig.temperature
- TextInferenceComponentConfig.tokenizer
 
 
- modalities.inference.text.inference_component module
- Module contents
 
 
- modalities.inference.text package
- Submodules
- modalities.inference.inference module
- Module contents
 
- Subpackages
- modalities.logging_broker package- Subpackages- modalities.logging_broker.subscriber_impl package- Submodules
- modalities.logging_broker.subscriber_impl.progress_subscriber module
- modalities.logging_broker.subscriber_impl.results_subscriber module
- modalities.logging_broker.subscriber_impl.subscriber_factory module
- Module contents
 
 
- modalities.logging_broker.subscriber_impl package
- Submodules
- modalities.logging_broker.message_broker module
- modalities.logging_broker.messages module
- modalities.logging_broker.publisher module
- modalities.logging_broker.subscriber module
- Module contents
 
- Subpackages
- modalities.models package- Subpackages- modalities.models.coca package- Submodules
- modalities.models.coca.attention_pooling module
- modalities.models.coca.coca_model module- CoCa
- CoCaConfig- CoCaConfig.bias_attn_pool
- CoCaConfig.epsilon_attn_pool
- CoCaConfig.model_config
- CoCaConfig.n_pool_head
- CoCaConfig.n_vision_queries
- CoCaConfig.prediction_key
- CoCaConfig.text_cls_prediction_key
- CoCaConfig.text_decoder_config
- CoCaConfig.text_embd_prediction_key
- CoCaConfig.vision_cls_prediction_key
- CoCaConfig.vision_embd_prediction_key
- CoCaConfig.vision_encoder_config
 
- TextDecoderConfig- TextDecoderConfig.activation
- TextDecoderConfig.attention_config
- TextDecoderConfig.bias
- TextDecoderConfig.block_size
- TextDecoderConfig.dropout
- TextDecoderConfig.epsilon
- TextDecoderConfig.ffn_hidden
- TextDecoderConfig.model_config
- TextDecoderConfig.n_embd
- TextDecoderConfig.n_head
- TextDecoderConfig.n_layer_multimodal_text
- TextDecoderConfig.n_layer_text
- TextDecoderConfig.prediction_key
- TextDecoderConfig.sample_key
- TextDecoderConfig.vocab_size
 
 
- modalities.models.coca.collator module
- modalities.models.coca.multi_modal_decoder module
- modalities.models.coca.text_decoder module
- Module contents
 
- modalities.models.components package
- modalities.models.gpt2 package- Submodules
- modalities.models.gpt2.collator module
- modalities.models.gpt2.gpt2_model module- AttentionConfig- AttentionConfig.QueryKeyValueTransformConfig- AttentionConfig.QueryKeyValueTransformConfig.IdentityTransformConfig
- AttentionConfig.QueryKeyValueTransformConfig.RotaryTransformConfig- AttentionConfig.QueryKeyValueTransformConfig.RotaryTransformConfig.base_freq
- AttentionConfig.QueryKeyValueTransformConfig.RotaryTransformConfig.model_config
- AttentionConfig.QueryKeyValueTransformConfig.RotaryTransformConfig.n_embd
- AttentionConfig.QueryKeyValueTransformConfig.RotaryTransformConfig.n_head
- AttentionConfig.QueryKeyValueTransformConfig.RotaryTransformConfig.seq_length_dim
 
- AttentionConfig.QueryKeyValueTransformConfig.config
- AttentionConfig.QueryKeyValueTransformConfig.model_config
- AttentionConfig.QueryKeyValueTransformConfig.parse_sharding_strategy_by_name()
- AttentionConfig.QueryKeyValueTransformConfig.type_hint
 
- AttentionConfig.model_config
- AttentionConfig.qkv_transforms
 
- AttentionImplementation
- CausalSelfAttention
- GPT2Block
- GPT2LLM
- GPT2LLMConfig- GPT2LLMConfig.activation_type
- GPT2LLMConfig.attention_config
- GPT2LLMConfig.attention_implementation
- GPT2LLMConfig.attention_norm_config
- GPT2LLMConfig.bias
- GPT2LLMConfig.check_divisibility()
- GPT2LLMConfig.dropout
- GPT2LLMConfig.enforce_swiglu_hidden_dim_multiple_of
- GPT2LLMConfig.ffn_hidden
- GPT2LLMConfig.ffn_norm_config
- GPT2LLMConfig.lm_head_norm_config
- GPT2LLMConfig.model_config
- GPT2LLMConfig.n_embd
- GPT2LLMConfig.n_head_kv
- GPT2LLMConfig.n_head_q
- GPT2LLMConfig.n_layer
- GPT2LLMConfig.poe_type
- GPT2LLMConfig.prediction_key
- GPT2LLMConfig.sample_key
- GPT2LLMConfig.seed
- GPT2LLMConfig.sequence_length
- GPT2LLMConfig.use_meta_device
- GPT2LLMConfig.use_weight_tying
- GPT2LLMConfig.validate_sizes()
- GPT2LLMConfig.vocab_size
 
- IdentityTransform
- LayerNormWrapperConfig
- LayerNorms
- PositionTypes
- QueryKeyValueTransform
- QueryKeyValueTransformType
- RotaryTransform
- TransformerMLP
- manual_scaled_dot_product_attention()
 
- Module contents
 
- modalities.models.huggingface package- Submodules
- modalities.models.huggingface.huggingface_model module- HuggingFaceModelTypes
- HuggingFacePretrainedModel
- HuggingFacePretrainedModelConfig- HuggingFacePretrainedModelConfig.huggingface_prediction_subscription_key
- HuggingFacePretrainedModelConfig.kwargs
- HuggingFacePretrainedModelConfig.model_args
- HuggingFacePretrainedModelConfig.model_config
- HuggingFacePretrainedModelConfig.model_name
- HuggingFacePretrainedModelConfig.model_type
- HuggingFacePretrainedModelConfig.prediction_key
- HuggingFacePretrainedModelConfig.sample_key
 
 
- Module contents
 
- modalities.models.huggingface_adapters package
- modalities.models.vision_transformer package- Submodules
- modalities.models.vision_transformer.vision_transformer_model module- ImagePatchEmbedding
- VisionTransformer
- VisionTransformerBlock
- VisionTransformerConfig- VisionTransformerConfig.add_cls_token
- VisionTransformerConfig.attention_config
- VisionTransformerConfig.bias
- VisionTransformerConfig.dropout
- VisionTransformerConfig.img_size
- VisionTransformerConfig.model_config
- VisionTransformerConfig.n_classes
- VisionTransformerConfig.n_embd
- VisionTransformerConfig.n_head
- VisionTransformerConfig.n_img_channels
- VisionTransformerConfig.n_layer
- VisionTransformerConfig.patch_size
- VisionTransformerConfig.patch_stride
- VisionTransformerConfig.prediction_key
- VisionTransformerConfig.sample_key
 
 
- Module contents
 
 
- modalities.models.coca package
- Submodules
- modalities.models.model module
- modalities.models.model_factory module- GPT2ModelFactory
- ModelFactory- ModelFactory.get_activation_checkpointed_fsdp1_model_()
- ModelFactory.get_activation_checkpointed_fsdp2_model_()
- ModelFactory.get_compiled_model()
- ModelFactory.get_debugging_enriched_model()
- ModelFactory.get_fsdp1_checkpointed_model()
- ModelFactory.get_fsdp1_wrapped_model()
- ModelFactory.get_fsdp2_wrapped_model()
- ModelFactory.get_weight_initialized_model()
 
 
- modalities.models.utils module
- Module contents
 
- Subpackages
- modalities.nn package- Subpackages- modalities.nn.model_initialization package- Submodules
- modalities.nn.model_initialization.composed_initialization module- ComposedInitializationRoutines
- ComposedModelInitializationConfig- ComposedModelInitializationConfig.hidden_dim
- ComposedModelInitializationConfig.mean
- ComposedModelInitializationConfig.model_config
- ComposedModelInitializationConfig.model_type
- ComposedModelInitializationConfig.num_layers
- ComposedModelInitializationConfig.std
- ComposedModelInitializationConfig.weight_init_type
 
- ModelInitializerWrapper
- ModelInitializerWrapperConfig
 
- modalities.nn.model_initialization.initialization_if module
- modalities.nn.model_initialization.initialization_routines module
- modalities.nn.model_initialization.parameter_name_filters module
- Module contents
 
 
- modalities.nn.model_initialization package
- Submodules
- modalities.nn.attention module
- modalities.nn.mlp module
- Module contents
 
- Subpackages
- modalities.optimizers package
- modalities.preprocessing package
- modalities.registry package
- modalities.running_env package- Subpackages- modalities.running_env.fsdp package- Submodules
- modalities.running_env.fsdp.device_mesh module- DeviceMeshConfig- DeviceMeshConfig.context_parallel_degree
- DeviceMeshConfig.data_parallel_replicate_degree
- DeviceMeshConfig.data_parallel_shard_degree
- DeviceMeshConfig.device_type
- DeviceMeshConfig.enable_loss_parallel
- DeviceMeshConfig.model_config
- DeviceMeshConfig.pipeline_parallel_degree
- DeviceMeshConfig.tensor_parallel_degree
- DeviceMeshConfig.world_size
 
- ParallelismDegrees
- get_device_mesh()
- get_parallel_degree()
 
- modalities.running_env.fsdp.fsdp_auto_wrapper module
- modalities.running_env.fsdp.reducer module
- Module contents
 
 
- modalities.running_env.fsdp package
- Submodules
- modalities.running_env.cuda_env module
- modalities.running_env.env_utils module
- Module contents
 
- Subpackages
- modalities.tokenization package
- modalities.training package- Subpackages- modalities.training.activation_checkpointing package
- modalities.training.gradient_clipping package- Submodules
- modalities.training.gradient_clipping.fsdp_gradient_clipper module
- modalities.training.gradient_clipping.fsdp_gradient_clipper_config module
- modalities.training.gradient_clipping.gradient_clipper module
- Module contents
 
 
- Submodules
- modalities.training.training_progress module- TrainingProgress- TrainingProgress.num_seen_steps_current_run
- TrainingProgress.num_seen_steps_previous_run
- TrainingProgress.num_seen_steps_total
- TrainingProgress.num_seen_tokens_current_run
- TrainingProgress.num_seen_tokens_previous_run
- TrainingProgress.num_seen_tokens_total
- TrainingProgress.num_target_steps
- TrainingProgress.num_target_tokens
 
 
- Module contents
 
- Subpackages
- modalities.utils package- Subpackages- modalities.utils.benchmarking package
- modalities.utils.profilers package
 
- Submodules
- modalities.utils.communication_test module
- modalities.utils.file_ops module
- modalities.utils.logger_utils module
- modalities.utils.mfu module
- modalities.utils.number_conversion module- LocalNumBatchesFromNumSamplesConfig
- LocalNumBatchesFromNumTokensConfig
- NumSamplesFromNumTokensConfig
- NumStepsFromNumSamplesConfig
- NumStepsFromNumTokensConfig
- NumStepsFromRawDatasetIndexConfig
- NumTokensFromNumStepsConfig
- NumTokensFromPackedMemMapDatasetContinuousConfig- NumTokensFromPackedMemMapDatasetContinuousConfig.dataset_path
- NumTokensFromPackedMemMapDatasetContinuousConfig.dp_degree
- NumTokensFromPackedMemMapDatasetContinuousConfig.gradient_accumulation_steps
- NumTokensFromPackedMemMapDatasetContinuousConfig.local_micro_batch_size
- NumTokensFromPackedMemMapDatasetContinuousConfig.model_config
- NumTokensFromPackedMemMapDatasetContinuousConfig.reuse_last_target
- NumTokensFromPackedMemMapDatasetContinuousConfig.sample_key
- NumTokensFromPackedMemMapDatasetContinuousConfig.sequence_length
 
- NumberConversion- NumberConversion.get_global_num_seen_tokens_from_checkpoint_path()
- NumberConversion.get_global_num_target_tokens_from_checkpoint_path()
- NumberConversion.get_last_step_from_checkpoint_path()
- NumberConversion.get_local_num_batches_from_num_samples()
- NumberConversion.get_local_num_batches_from_num_tokens()
- NumberConversion.get_num_samples_from_num_tokens()
- NumberConversion.get_num_seen_steps_from_checkpoint_path()
- NumberConversion.get_num_steps_from_num_samples()
- NumberConversion.get_num_steps_from_num_tokens()
- NumberConversion.get_num_steps_from_raw_dataset_index()
- NumberConversion.get_num_target_steps_from_checkpoint_path()
- NumberConversion.get_num_tokens_from_num_steps()
- NumberConversion.get_num_tokens_from_packed_mem_map_dataset_continuous()
 
- NumberConversionFromCheckpointPathConfig
 
- modalities.utils.seeding module
- modalities.utils.typing_utils module
- modalities.utils.verify_tokenization_consistency module
- Module contents
 
- Subpackages
 
- modalities.checkpointing package
- Submodules
- modalities.api module- FileExistencePolicy
- convert_pytorch_to_hf_checkpoint()
- create_filtered_tokenized_dataset()
- create_raw_data_index()
- create_shuffled_dataset_chunk()
- create_shuffled_jsonl_dataset_chunk()
- enforce_file_existence_policy()
- generate_text()
- merge_packed_data_files()
- pack_encoded_data()
- shuffle_jsonl_data()
- shuffle_tokenized_data()
 
- modalities.batch module
- modalities.evaluator module
- modalities.exceptions module
- modalities.gym module
- modalities.loss_functions module
- modalities.main module
- modalities.trainer module
- modalities.util module- Aggregator
- TimeRecorder
- TimeRecorderStates
- format_metrics_to_gb()
- get_experiment_id_from_config()
- get_local_number_of_trainable_parameters()
- get_module_class_from_name()
- get_synced_experiment_id_of_run()
- get_synced_string()
- get_total_number_of_trainable_parameters()
- parse_enum_by_name()
- print_rank_0()
- warn_rank_0()
 
- Module contents
 
- Subpackages
 
- modalities package