Computer

Generative models need to work on smartphones: Why Microsoft introduces Phi-3

Generative models need to work on smartphones: Why Microsoft introduces Phi-3

We have talked extensively about AI PC, i.e. new generation computers equipped with the hardware needed to manage the most demanding workloads linked to artificial intelligence (AI) applications. The goal, however, is to bring AI features to smartphones too smartphone: what we can do today with the latest generation mobile devices is to limit ourselves to just scraping the surface.

For this reason, Microsoft presented Phi-3, its compact-sized generative model designed and developed specifically for smartphones. The most compact version, Phi-3 Mini, includes 3.8 billion parameters. Next to it, the Redmond company offers Phi-3 Small (7 billion parameters) and Phi-3 Medium (14 billion parameters).

I Large Language Models (LLM) used just one billion parameters just five years ago (GPT-2 1.5 billion parameters) while today we have reached truly incredible values. THE parameters represent i weights of the connections between the units that make up the neural models. Using more parameters can allow the model to capture more detail and nuance in the data, improving performance generative capabilities. However, the use of a large number of parameters also requires computational resources significant for the training and execution of the model itself. For this reason, he says for example Oracleit is appropriate to start rethinking the future of artificial intelligence.

Microsoft is banking on the new Phi-3 model to bring generative models and natural language processing to smartphones

The new release, documented in this in-depth analysis, follows that of the Phi-2 model presented just in December 2023. Considering the enormous steps forward made by competitors, Meta first and foremost, Microsoft has decided to propose a further new update: Phi-3 Mini , compared to 3.8 billion parameters, would be able to exceed the performance of Meta Llama (8 billion parameters) and OpenAI GPT-3, at least according to the results of the benchmarks conducted by the company’s technicians.

Her dimensions limited allow the use of Phi-3 on mobile devices, equipped with reduced computational power. Thanks to Phi-3 Mini, according to Microsoft itself, a new chapter will open for the development of applications based on features centered on AI.

Eric Boyd, Microsoft vice president, states that Phi-3 is capable of processing natural language directly on the smartphone, without the need to “inconvenience” cloud services. Obviously, we add, it is nothing remotely comparable with the knowledge base prerogative of generative models available on the Internet and the result of a massive training activity. However, it can contribute to significantly changing the “rules of the game”, demonstrating that devices such as smartphones can carry out operations on their own inference until now the prerogative of devices equipped with “more advanced” hardware configurations.

How the Phi-3 model was born: what principles is it inspired by?

The development of models based on billions of parameters starts from the so-called “scaling laws” which presuppose the use of a “fixed” data source. This assumption is increasingly challenged by the use of “frontier” LLMs which open the doors to interacting with data using new methods.

Microsoft researchers explain that the study on “Phi” models demonstrated how a combination of data from the Web and synthetic data created by LLM allows the achievement of high performance in small language models, performance that was typically only seen in much larger models.

From a more purely technical point of view, Phi-3 Mini uses Transformer-based architecture, with a context window default equal to 4K. However, Microsoft does not preclude the possibility of using one context window much broader: call LongRope if extended to 128K.

Thanks to its small size, Phi-3 Mini can be quantizzato a 4 bit so it takes up approximately 1.8 GB of memory. Trying it on a Apple iPhone 14 with A16 Bionic chip, the latency observed in inference operations on 4,096 tokens is 44 ms; for blocks of 64K tokens it rises to 53 ms.

Opening image credit: iStock.com – da-kuk

Leave a Reply

Your email address will not be published. Required fields are marked *