Article Source
Modality Alignment for Multimodal Perception & Open-Source Lightweight MLLM
In the 48th session of Multimodal Weekly, we welcomed two researchers working in multimodal understanding.
✅ Max (Letian) Fu, a Ph.D. student at UC Berkeley, will dive into aligning touch, vision, and language for multimodal perception. - Follow Letian: https://max-fu.github.io/
Check out the following resources on the TVL paper:
- Project: https://tactile-vlm.github.io/
- Paper: https://arxiv.org/abs/2401.14391
- Code: https://github.com/Max-Fu/tvl
- Dataset: https://huggingface.co/datasets/mlfu7/Touch-Vision-Language-Dataset
- Models: https://huggingface.co/mlfu7/Touch-Vision-Language-Models
✅ Dr. Bo Zhao, Principal Investigator of Beijing Academy of Artificial Intelligence, will introduce Bunny, a series of lightweight MLLMs with Phi, StableLM, Llama as the language backbones, and SigLIP as the vision encoder.
- Follow Bo: https://www.bozhao.me/
Check out the following resources on the Bunny series: