Future Work
The team is currently working on our already established LLM code base to bring in multi-modality into the mix. This extension will be based on ideas similar to CoCa and/or AudioPaLM, which would enable users to either use different encoders for different modalities in conjunction with a text-based decoder, or use a decoder-only architecture. Future modalities other than text can be used, namely,
image
audio
video