When an X user asked [Sophia Yang, the head of developer relations at the company] what makes the Pixtral 12-billion parameter model unique, she said it will natively support an arbitrary number of images of arbitrary sizes. As shared by initial testers on X, the 24GB model’s architecture appears to have 40 layers, 14,336 hidden dimension sizes and 32 attention heads for extensive computational processing. On the vision front, it has a dedicated vision encoder with 1024×1024 image resolution support and 24 hidden layers for advanced image processing. This, however, can change when the company makes it available via API.
Categories: Leben (Life aka misc)Technology