LLaVA-Next (Large Language and Vision Assistant) is an open-source multimodal model that connects visual and language understanding. It can analyze complex images, charts, documents, and scenes with detailed reasoning, rivaling GPT-4V on many benchmarks while remaining fully open.