Open-source multimodal model linking vision encoders to LLMs. UW Madison + Microsoft collab.
Largest open-weight LLaVA — vision encoder + Yi-34B backbone.