Configuration
Training config is defined in yaml formatted files. See data/config_lorem_ipsum.yaml
. These configs are very explicit specifying all training parameters to keep model trainings as transparent and reproducible as possible. Each config setting is reflected in pydantic classes in src/modalities/config/*.py
. In the config you need to define which config classes to load in field type_hint. This specifies the concrete class. A second parameter, config, then takes all the constructor arguments for that config class. This way it is easy to change i.e. DataLoaders while still having input validation in place.
Pydantic and ClassResolver
The mechanismn introduced to instantiate classes via type_hint
in the config.yaml
, utilizes
Omegaconf to load the config yaml file
Pydantic for the validation of the config
ClassResolver to instantiate the correct, concrete class of a class hierarchy.
Firstly, Omegaconf loads the config yaml file and resolves internal refrences such as ${subconfig.attribue}.
Then, Pydantic validates the whole config as is and checks that each of the sub-configs are pydantic.BaseModel
classes.
For configs, which allow different concrete classes to be instantiated by ClassResolver
, the special member names type_hint
and config
are introduced.
With this we utilize Pydantics feature to auto-select a fitting type based on the keys in the config yaml file.
ClassResolver
replaces large if-else control structures to infer the correct concrete type with a type_hint
used for correct class selection:
activation_resolver = ClassResolver(
[nn.ReLU, nn.Tanh, nn.Hardtanh],
base=nn.Module,
default=nn.ReLU,
)
type_hint="ReLU"
activation_kwargs={...}
activation_resolver.make(type_hint, activation_kwargs),
In our implmentation we go a step further, as both,
a
type_hint
in aBaseModel
config must be of typemodalities.config.lookup_types.LookupEnum
andconfig
is a union of allowed concrete configs of base typeBaseModel
.
config
hereby replaces activation_kwargs
in the example above, and replaces it with pydantic-validated BaseModel
configs.
With this, a mapping between type hint strings needed for class-resolver, and the concrete class is introduced, while allowing pydantic to select the correct concrete config:
from enum import Enum
from typing import Annotated
from pydantic import BaseModel, PositiveInt, PositiveFloat, Field
class LookupEnum(Enum):
@classmethod
def _missing_(cls, value: str) -> type:
"""constructs Enum by member name, if not constructable by value"""
return cls.__dict__[value]
class SchedulerTypes(LookupEnum):
StepLR = torch.optim.lr_scheduler.StepLR
ConstantLR = torch.optim.lr_scheduler.ConstantLR
class StepLRConfig(BaseModel):
step_size: Annotated[int, Field(strict=True, ge=1)]
gamma: Annotated[float, Field(strict=True, ge=0.0)]
class ConstantLRConfig(BaseModel):
factor: PositiveFloat
total_iters: PositiveInt
class SchedulerConfig(BaseModel):
type_hint: SchedulerTypes
config: StepLRConfig | ConstantLRConfig
To allow a user-friendly instantiation, all class resolvers are defined in the ResolverRegistry
and build_component_by_config
as convenience function is introduced. Dependecies can be passed-through with the extra_kwargs
argument:
resolvers = ResolverRegister(config=config)
optimizer = ... # our example dependency
scheduler = resolvers.build_component_by_config(config=config.scheduler, extra_kwargs=dict(optimizer=optimizer))
To add a new resolver use add_resolver
, and the corresponding added resolver will be accessible by the register_key given during adding.
For access use the build_component_by_key_query
function of the ResolverRegistry
.