Review of paper 'Small Language Models are the Future of Agentic AI'

In the paper Small Language Models are the Future of Agentic AI, the authors tried to encourage the use of SLM ( small language model ) in developing agentic AI applications. According to the paper, an SLM is any LLM having less than 10BN parameters. The paper tries to establish a viable position on why SLMs are more suited for agentic AI compared to the established trend of using LLM.

Agentic AI is defined as a collection of autonomous AI agent working together to solve complex tasks which have been decomposed into smaller subtasks and communicate with one another using protocols such as A2A to collaborate.

The authors formulated three core value statements on why SLM are better for agentic applications:

Value statement 1 - SLMs are sufficiently powerful enough to handle language modeling errands of agentic applications.
Value statement 2 - SLMs are operationally suitable for use in agentic systems than LLMs
Value statement 3 - SLMs are more economical to use in agentic systems due to their smaller size

For value statement 1, the authors note that majority of agentic subtasks are repetitive, scoped and non-conversational which means the full capability of an LLM are unused. In addition, the authors point out that recent studies in scaling model size and capabilities indicate that the newer SLM are closer to those of previous LLMs. This means that a well-designed SLM can meet or exceed the same task performance of LLMs.

We can also enhance the reasoning capabilities of SLMs at inference time using techniques such as prompting; agentic augmenation techniques and modern training approaches.

To provide a concrete comparison of capabilties which include reasoning, tool calling, code generation and instruction calling between an LLM and SLM, the authors considered a list of models where there exist both an LLM and an SLM variant with less parameters e.g. DeepSeek-R1-Distill series of models have 1.5b - 8BN parameters and trained on samples generated by DeepSeek-R1. The studies indicate that the SLM with less parameters outperform some of the older LLM models.

For value statement 2, the authors state that SLM are more afforable and practicaal to train and adapt as specialized models for different agentic routines. This means that we can fine-tune SLMs faster than LLMs, enabling rapid iteration and adaption. It also means that it will encourage more collaboration between individuals and organizations, which could lead to reduce risk of systematic biases. Agentic interactions such as tool calling requires strict inputs and format. An LLM could result in hallucations compared to SLM which is trained with the required formats via post-training or fine-tuning. Since SLMs have less parameters, it’s also cheaper to train.

The authors observed that an AI agent consists of carefuly crafted prompts and context management scoped to work in a small section of an LLM skillset which means that an LLM is under utilised. A single SLM fine-tuned with the selected prompts would suffice.

In addition, agentic systems are by their nature heterogeneous. For example, an LLM can itself be a tool called by another LLM. Using different LLMs of different sizes and capabilities allows for the use of SLMs.

For value statement 3, the authors point out that serving a single SLM can be 10-30x cheaper since SLMs require less or no GPU usage hence lowering cost of inference infrastructure. Training techniques such as LoRa, DoRA for SLM require only a few GPU hours, compared to days / weeks of training time for retraining an LLM with new behaviours. By adopting modularity in agentic applications, we can add new skills and abilities to adapt to changing requirements. This modular approach supports scaling out by adding small specialized experts rather than scaling up LLM, which yield systems that are cheaper and faster to debug and deploy.

The authors provided a list of contradictions to the value statements made previously. In terms of value statement 1, the authors point out that the general consensus is that LLM are generalists due to their parameter size, and as such, are better at a wide variety of tasks. In addition, LLM also possess the semantic hub mechanism, which allows it to integrate and abstract semantic information from various modalities and languages.

The authors refute this point by stating that the SLM can be easily fine-tuned whereas LLM requires more resources for retraining. The reasoning capability of SLM is afforable as SLM can be scaled during inference time to desired level. The core feature of agentic systems is its ability to perform decompositions of complex problems and inputs hence the invocation of SLM within an agentic system would be on modular sub-tasks so simple that the semantic hub from LLM would be of little use.

The authors point out that while the argument for using SLM is sound, there are still barriers to adoption which include:

The amount of current investments in centralized LLM and inference infra
The use of generalist benchmarks in SLM training, design and evaluation
The lack of awareness of SLM as it doesn’t receive the same level of attention as LLM

Given the above, the authors are still confident that with the availability of lower cost inference infrastructure and edge devices, it’s a question of time before SLM gain wider adoption.

From a software developer perspective, AI agents are still software but with a language model as an additional component. Being able to use SLM in an AI agent means that we can develop and test the overall agent locally before deployment. The developer can ensure that the SLM used in an AI agent is fine-tuned using the appropriate dataset and deployed with the right configuration settings before integration into an agentic system. The modular nature of SLM also means that we can have individual specialized models fine-tuned for specific use cases which would allow us to create specialized agents that perform certain tasks really well in the context of sub-tasks decomposition in an agentic system. This is similar to the approach taken when developing distributed applications in the format of micro services. It also means we can test each agent component individually and it also supports the idea of reproducibility.

In summary, the paper sets out a set of propositions that small language models should be used in agentic systems rather than monolithic LLMs. This is due to the fact that agentic systems require deterministic behaviour and interactions and SLM can provide specialized behaviour when fine-tuned appropriately. In addition, the lower number of parameters in SLM means that we can lower the costs of deployment and reduce the environmental impact compared to using LLMs. Lastly, most interactions with LLMs in an agentic system are non-conversational and only use a small subset of an LLM overall ability so an SLM should suffice in such use cases.

Citation:

Belcak, P., Heinrich, G., Diao, S., Fu, Y., Dong, X., Muralidharan, S., Lin, Y.C., Molchanov,
P. 2025, 
'Small Language Models are the Future of Agentic AI',
arXiv preprint arXiv:2506.02153.