New Qwen2 AI Model from Alibaba to Challenge Meta, OpenAI

Alibaba, the Chinese language e-commerce big, is a significant participant in China’s AI sphere. At the moment, it introduced the discharge of its newest AI mannequin, Qwen2—and by some measures, it’s one of the best open-source possibility of the second.

Developed by Alibaba Cloud, Qwen2 is the following era of the agency’s Tongyi Qianwen (Qwen) mannequin collection, which incorporates the Tongyi Qianwen LLM (also referred to as simply Qwen), the imaginative and prescient AI mannequin Qwen-VL, and Qwen-Audio.

The Qwen mannequin household is pre-trained on multilingual information masking varied industries and domains, with Qwen-72B essentially the most highly effective mannequin within the collection. It’s skilled on a powerful 3 trillion tokens of information. By comparability, Meta’s strongest Llama-2 variant is predicated on 2 trillion tokens. Llama-3, nonetheless, is within the strategy of digesting 15 trillion tokens.

In line with a latest weblog publish by the Qwen workforce, Qwen2 can deal with 128K tokens of context—corresponding to GPT-4o from OpenAI. Qwen2 has in the meantime outperformed Meta’s LLama3 in principally all an important artificial benchmarks, the workforce asserts, making it one of the best open-source mannequin at the moment out there.

Nonetheless, it is price noting that the impartial Elo Area ranks Qwen2-72B-Instruct a bit higher than GPT-4-0314 however under Llama3 70B and GPT-4-0125-preview, making it the second most favored open-source LLM amongst human testers thus far.

Qwen2 performs higher than Llama3, Mixtral and Qwen1.5 in artificial benchmarks. Picture: Alibaba Cloud

Qwen2 is on the market in 5 totally different sizes, starting from 0.5 billion to 72 billion parameters, and the discharge delivers important enhancements in several areas of experience. Additionally, the fashions had been skilled with information in 27 extra languages than the earlier launch, together with German, French, Spanish, Italian, and Russian, along with English and Chinese language.

“In contrast with the state-of-the-art open supply language fashions, together with the earlier launched Qwen1.5, Qwen2 has usually surpassed most open supply fashions and demonstrated competitiveness in opposition to proprietary fashions throughout a collection of benchmarks concentrating on for language understanding, language era, multilingual functionality, coding, arithmetic, and reasoning,” the Qwen workforce claimed on the mannequin’s official web page on HuggingFace.

The Qwen2 fashions additionally present a powerful understanding of lengthy contexts. Qwen2-72B-Instruct can deal with data extraction duties anyplace inside its enormous context with out errors, and it handed the “Needle in a Haystack” check nearly completely. That is necessary, as a result of historically, mannequin efficiency begins to degrade the extra we work together with it.

Qwen2 performs remarkably in the "Needle in a Haystack" test. Image: Alibaba Cloud — Qwen2 performs remarkably within the “Needle in a Haystack” check. Picture: Alibaba Cloud

With this launch, the Qwen workforce has additionally modified the licenses for its fashions. Whereas Qwen2-72B and its instruction-tuned fashions proceed to make use of the unique Qianwen license, all different fashions have adopted Apache 2.0, an ordinary within the open-source software program world.

“Within the close to future, we’ll proceed opensource new fashions to speed up open-source AI,” Alibaba Cloud mentioned in an official weblog publish.

Decrypt examined the mannequin and located it to be fairly succesful at understanding duties in a number of languages. The mannequin can be censored, notably in themes which are thought-about delicate in China. This appears in keeping with Alibaba’s claims of Qwen2 being the least doubtless mannequin to supply unsafe outcomes—be it criminal activity, fraud, pornography, and privateness violence— regardless of which language during which it was prompted.

Qwen2's reply to: Is Taiwan a Country? — Qwen2’s reply to: “Is Taiwan a Nation?”

ChatGPT's reply to: Is Taiwan a Country? — ChatGPT’s reply to: “Is Taiwan a Nation?”

Additionally, it has a superb understanding of system prompts, which implies the situations utilized may have a stronger impression on its solutions. For instance, when instructed to behave as a useful assistant with data of the regulation versus performing as a educated lawyer who at all times responds primarily based on the regulation, the replies to confirmed main variations. It supplied recommendation much like recommendation supplied by GPT-4o, however was extra concise.

Qwen2's reply to: A neighbord insulted me — Qwen2’s reply to: “A neighbord insulted me”

ChatGPT's reply to: "A neighbord insulted me" — ChatGPT’s reply to: “A neighbord insulted me”

The subsequent mannequin improve will deliver multimodality to the Qwen2 LLM, probably merging all of the household into one highly effective mannequin, the workforce mentioned. “Moreover, we prolong the Qwen2 language fashions to multimodal, able to understanding each imaginative and prescient and audio data,” they added.

Qwen is on the market for on-line testing through HuggingFace Areas. These with sufficient computing to run it domestically can obtain the weights without spending a dime, additionally through HuggingFace.

The Qwen2 mannequin could be a nice different for these keen to guess on open-source AI. It has a bigger token context window than most different fashions, making it much more succesful than Meta’s LLama 3. Additionally, as a consequence of its license, fine-tuned variations shared by others could enhance upon it, additional growing its rating and overcoming bias.

Edited by Ryan Ozawa.