Stop Thinking, Just Do!

Sungsoo Kim's Blog

Modality Alignment for Multimodal Perception

tagsTags

18 July 2024


Article Source


Modality Alignment for Multimodal Perception & Open-Source Lightweight MLLM

In the 48th session of Multimodal Weekly, we welcomed two researchers working in multimodal understanding.

​​​​✅ Max (Letian) Fu, a Ph.D. student at UC Berkeley, will dive into aligning touch, vision, and language for multimodal perception. ​​​​ ​​​​- Follow Letian: https://max-fu.github.io/

Check out the following resources on the TVL paper:

✅ Dr. Bo Zhao, Principal Investigator of Beijing Academy of Artificial Intelligence, will introduce Bunny, a series of lightweight MLLMs with Phi, StableLM, Llama as the language backbones, and SigLIP as the vision encoder.

Check out the following resources on the Bunny series:


comments powered by Disqus