Alibaba’s cloud computing unit unveiled the Qwen3-Omni series on Tuesday, describing it as the first native end-to-end multimodal system that “unifies text, images, audio and video in one model”.
Shares of Alibaba rose 4.3 per cent in Hong Kong as of 1.41pm.
Initially launched in April, Qwen3 has expanded into a family of models possessing text, image, audio and visual capabilities, as well as various multimodal systems tailored for real-world applications.
Qwen researcher Lin Junyang attributed the new multimodal system’s enhancements to general improvements across foundational projects related to audio and images. The Qwen team “combined everything … to build our Qwen3-Omni”, he said.