Pixtral 12B is Mistral AI's first multimodal model, combining a 12B parameter language model with a 400M parameter vision encoder to deliver strong image understanding alongside text capabilities. Unlike many vision models that only process single images, Pixtral 12B can analyze multiple images simultaneously and reason about visual content in relation to each other. The model excels at document understanding, chart and diagram analysis, and code screenshot interpretation. Available as open weights and through the Mistral API, Pixtral is ideal for building vision-capable AI applications.