Mistral AI launches Voxtral TTS, an open-weight enterprise voice model that runs on a smartphone and challenges ElevenLabs in ...
GLM-5V-Turbo is Z.ai's first native multimodal agent foundation model, built for vision-based coding and agentic task ...
Microsoft's MAI-Transcribe-1 model currently leads the FLEURS benchmark in 11 core languages and outperforms competitors like ...
Join the event trusted by enterprise leaders for nearly two decades. VB Transform brings together the people building real enterprise AI strategy. Learn more Multi-modal models that can process both ...
Every time a language model like GPT-4, Claude or Mistral generates a sentence, it does something deceptively simple: It picks one word at a time. This word-by-word approach is what gives ...
Google on Friday added a new, experimental “embedding” model for text, Gemini Embedding, to its Gemini developer API. Embedding models translate text inputs like words and phrases into numerical ...
Researchers at Amazon have trained the largest ever text-to-speech model yet, which they claim exhibits “emergent” qualities improving its ability to speak even complex sentences naturally. The ...
Sora can tackle complex scenes, including multiple characters, specific types of motion, and great detail, because of the model's deep understanding of language, prompts, and how the subjects exist in ...
Alibaba has launched a new AI model for image generation called Qwen VLo. This model builds on the earlier Qwen 2.5 vision language system, and now offers enhanced features for creating and editing ...