Why Verticalizing Reasoning Models?

Yann Bilien - CSO Rippletide

Jun 2, 2025

This is the first article of a series describing a verticalized path toward AI human-level capabilities. First, let’s understand how current “reasoning models” work and why their scaling capacity is uncertain.

The next article will be about LLMs core limits and their neuro-symbolic counterpart.

1- What are reasoning models?

Traditional LLMs are trained to predict the next token in a sequence, effectively learning to mimic patterns in human-written text. While this approach enables them to generate coherent and contextually relevant responses, it doesn't inherently equip them with the ability to reason through complex tasks.

Reasoning models, however, undergo an additional training phase using reinforcement learning. In this setup, the model is presented with a problem and attempts to generate a solution through a series of logical steps, often referred to as a "Chain-of-Thought." If the model arrives at a correct solution, it's rewarded and adjusted to reinforce this behavior. Conversely, incorrect solutions lead to adjustments that discourage similar reasoning paths in the future.

This iterative process enables the model to refine its reasoning strategies over time.

But why am I claiming these are not truly reasoning models?

A reasoning model is a system engineered to process information, apply logical rules, and derive conclusions. Unlike traditional AI models that primarily rely on pattern recognition, reasoning models focus on structured problem-solving, allowing for more complex and nuanced decision-making.

First let’s have a look at how the recent reasoning models work.

A- What is Chain-of-Thought?

Recently OpenAI released their flagship reasoning models: o1 and o3.

These models, as the majority of widely released reasoning models, are based on Chain-of-Thought.

Chain-of-Thought is now widely used to improve the outputs of a system, the idea is simple and makes a lot of sense: dividing a problem into sub-problems easier to solve. That’s a pretty fair method, often used in mathematics and more largely on how humans are thinking when solving a problem.

Models such as o1/o3 think loudly, it means they describe in text how to break down the problem and then give the next LLM only a part of the problem to solve.

B- Improving Outcomes with Reinforcement Learning

Both o1 and o3 models are trained using reinforcement learning techniques. This training method involves the models receiving feedback on their reasoning paths: rewarding correct sequences and penalizing incorrect ones. Over time, this process refines their problem-solving strategies, enabling them to adapt and improve their reasoning capabilities.


2- Scaling compute will plateau around 2026

These reasoning models rely on classic LLMs, initially trained during a pre-training step. The scaling of pre-training is now well documented by research works on how providing more compute improves the performance of the model.

For example, frenchies Arthur Mensch (now Mistral AI) and Laurent Sifre (now H) while at DeepMind have shown that the optimal lies in scaling equally the number of parameters and number of tokens.

See here: https://arxiv.org/pdf/2203.15556

Or more recently, at the NeurIPS 2024 conference, Ilya Sutskever, cofounder of OpenAI (now at Safe Super Intelligence Ltd), stated that methods to scale LLMs (the usual way in the pre-training step) have plateaued.

Regarding the reasoning models, the “post-training” (the step described above incorporating reinforcement learning) scaling is still quite unknown.

OpenAI has increased compute by 10x to train o3 compared to o1.

Taken from OpenAI’s o3 livestream announcement (18:45) / https://epoch.ai/gradient-updates/how-far-can-reasoning-models-scale

A recent TechCrunch article claims that the improvements in reasoning models will slow down in the next few years. Why is that?

The cited analyst from Epoch explains “performance gains from standard AI model training are currently quadrupling every year, while performance gains from reinforcement learning are growing tenfold every 3-5 months. The progress of reasoning training will probably converge with the overall frontier by 2026”.

He adds “If reasoning training continues to scale at 10× every few months, in line with the jump from o1 to o3, it will reach the frontier of total training compute before long, perhaps within a year. At that point, the scaling rate will slow and converge with the overall growth rate in training compute of ~4× per year. Progress in reasoning models may slow down after this point as well.”

That means this current method to train reasoning models might slow down around 2026, in effect those Chain-of-Thought reasoning models rely on the initial LLM. Back to Ilya’s statement, not being able to improve fast enough LLMs means limiting those reasoning models. Plus, we don’t have enough compute at the moment to scale those.

I will explain in a future post how Rippletide developed a verticalized reasoning engine requiring much less compute.


3- Before anything else, a data scarcity problem

The truth is for business applications the true bottleneck is not compute, it’s the data to train the reasoning engines. 

Such models are trained on hard problems, usually maths, code and logic. Why? Because you can evaluate if the model has been successful to find the right result.

But results show generalization does not work well on reasoning models, which means training the model on code to perform autonomously sales processes is not very efficient. 

That means, to handle any specific task, your model should remain generic.

And there is the rub, training a model knowing how to cook does not help me to sell. But the data and compute needs do not take that into account.

That’s why verticalizing the reasoning step on specific tasks is the fastest way to reach this level of capability.

But is this enough?

For example, in sales, how many clean processes can you leverage to train those models? If you believe you have enough, keep in mind that it is around millions of well annotated processes. Without data missing in your Salesforce.

Rendez-vous next week to dive in how a neuro-symbolic approach can be more effective.

Yann

Pioneering AGI for Sales

Ready to Put Your Inside Sales on Autopilot?

Ready to Put Your Inside Sales on Autopilot?

Stop losing deals to slow follow-ups and overwhelmed SDRs. Let Rippletide’s Autonomous AI respond instantly, nurture persistently, and close deals around the clock—while your best reps focus on the biggest opportunities.

Stop losing deals to slow follow-ups and overwhelmed SDRs. Let Rippletide’s Autonomous AI respond instantly, nurture persistently, and close deals around the clock—while your best reps focus on the biggest opportunities.

demo of product
demo of product

Rippletide deploys autonomous AI Sales Agents that engage every lead 24/7, adapt in real time, and turn conversations into revenue.

© 2025 Rippletide All rights reserved.

Rippletide deploys autonomous AI Sales Agents that engage every lead 24/7, adapt in real time, and turn conversations into revenue.

© 2025 Rippletide All rights reserved.

Rippletide deploys autonomous AI Sales Agents that engage every lead 24/7, adapt in real time, and turn conversations into revenue.

© 2025 Rippletide All rights reserved.