Modality Alignment for Multimodal Perception & Open-Source Lightweight MLLM

In the 48th session of Multimodal Weekly, we welcomed two researchers working in multimodal understanding.

✅ Max (Letian) Fu, a Ph.D. student at UC Berkeley, will dive into aligning touch, vision, and language for multimodal perception. - Follow Letian: https://max-fu.github.io/

Check out the following resources on the TVL paper:

Project: https://tactile-vlm.github.io/
Paper: https://arxiv.org/abs/2401.14391
Code: https://github.com/Max-Fu/tvl
Dataset: https://huggingface.co/datasets/mlfu7/Touch-Vision-Language-Dataset
Models: https://huggingface.co/mlfu7/Touch-Vision-Language-Models

✅ Dr. Bo Zhao, Principal Investigator of Beijing Academy of Artificial Intelligence, will introduce Bunny, a series of lightweight MLLMs with Phi, StableLM, Llama as the language backbones, and SigLIP as the vision encoder.

Follow Bo: https://www.bozhao.me/

Check out the following resources on the Bunny series:

Technical Report: https://arxiv.org/abs/2402.11530
Code: https://github.com/BAAI-DCAI/Bunny?tab=readme-ov-file
Data: https://huggingface.co/datasets/BoyaWu10/Bunny-v1_0-data
Demo: http://bunny.baai.ac.cn/

Stop Thinking, Just Do!

Modality Alignment for Multimodal Perception

Tags

18 July 2024

Article Source

Modality Alignment for Multimodal Perception & Open-Source Lightweight MLLM