Tiny multimodal vision-language model — 1.8B params, runs on laptops.
Tiny multimodal — laptop-class image understanding.