
Dr. Asyikeen Azhar
Chief Technical Officer
Connecting the model is easy. Understanding how it should behave is harder.

Everyone wants to build AI right now. Every week there’s a new AI tool, a new framework, a new startup shipping something with an LLM. We saw this firsthand in our 2025 startup incubator cohort - about 43% of the companies were building something with AI. But, building AI systems and understanding how they should behave are two very different things.
The industry has quietly shifted. Ten years ago, AI progress was mostly driven by researchers pushing algorithmic boundaries. Today, the barrier to building AI products has dropped so low that almost anyone can plug a model into an interface and ship something. In practice, engineering has accelerated, but scientific thinking hasn’t always kept up.
The industry didn’t always look like this. Let’s go blow-by-blow what happened in the last 10 years, split into 2 distinct ears.
This period was dominated by algorithmic breakthroughs. Researchers were pushing the boundaries of what models could do:
2015–2016 – DeepMind’s AlphaGo defeated world champion Lee Sedol in Go, a milestone many experts thought was decades away.
2017 – the Transformer architecture was introduced, which later became the foundation of modern large language models.
2018–2020 – models like BERT and GPT-2 pushed capabilities in language understanding and scientific discovery. The pace of research was exploding too. By 2019, more than three AI research papers were being submitted to arXiv every hour, illustrating how quickly the field was accelerating.
Back then, the limiting factor wasn’t ideas. It was industry adoption. Most companies didn’t yet know how to turn these breakthroughs into real products (been there, done that).
Fast forward a few years, and the landscape looks completely different. Frameworks are everywhere, LLM APIs are everywhere. Suddenly, everyone is building AI tools. The release of ChatGPT in late 2022 triggered a wave of adoption. It became the fastest-growing consumer application in history, reaching 100 million users in about two months.
Today, the scale is staggering:
The barrier to entry, for better or for worse, collapsed. You no longer need to train a model, you just need to connect an API.
The industry moved from a world where researchers dominated progress to a world where engineers dominate product creation and that shift has consequences. Because building AI systems isn’t just about wiring models together. It’s about understanding how they should behave and that part still requires science.
Connecting APIs ≠ designing intelligence. Today, a lot of AI products are built the same way:
Prompt → LLM → Output.
If the output isn’t good enough, the instinct is simple: tweak the prompt!
Add more instructions.
Add more context.
Rewrite the wording.
Prompt tuning has almost become the default solution to every problem. But prompting is just one lever in a much larger design space. Part of the reason for this is who is building AI systems today. Surveys like the Stack Overflow Developer Survey show that the vast majority of people working with AI identify primarily as software developers, not research scientists/data scientists. The explosion of LLM APIs and frameworks means anyone can now connect a model to an interface and ship something quickly. Engineering made AI dramatically more accessible.
But accessibility also means many systems are designed around how to connect the model, rather than how the system should behave. And that’s where the gap appears. Many people know how to call the model. Far fewer think deeply about how the system should reason. That’s the difference between engineering and scientific thinking. Engineering connects components together. Scientific thinking asks deeper questions:
Once you start asking those questions, the design space expands. The solution might involve retrieval, where external knowledge is brought into the system before generation. It might involve domain constraints, so the model operates within the boundaries of a specific field. It might involve multiple agents, each responsible for a different step in the reasoning process. And often, the intelligence of the system lies in the information that flows between them.
Context retrieved from databases.
Metadata used to filter and rank information.
Heuristics deciding which outputs survive to the next stage.
Essentially, the model is not the whole system. It is just one component in the pipeline. Prompt tuning adjusts wording while algorithm design determines how the system thinks.
Once you move beyond prompting, the system starts to look very different. A lot of modern AI tools treat the model as the centre of intelligence. You ask a question, the model generates an answer, and that answer becomes the product. But, in well-designed systems, the intelligence is often distributed across the architecture. The model is only one part of the process.
Before generation even happens, the system might retrieve relevant information. This could come from curated databases, historical examples, or domain-specific knowledge. This step changes the behaviour of the model significantly. Instead of generating from its general training data, it now operates with structured context.
Next, are constraints. Many real-world problems operate within boundaries: brand tone, regulatory rules, scientific validity, or strategic objectives. If those constraints are not encoded into the system, the model may produce outputs that sound plausible but are unusable. Scientific thinking in AI design means asking: what rules of the domain should shape the generation process?
Another common approach is task decomposition. Instead of asking a single model to solve everything at once, the system breaks the problem into smaller steps. Different agents may handle different parts of the reasoning process. One agent might analyse patterns. Another might propose ideas. Another might evaluate the quality of the output.
What matters here is not just the agents themselves, but what information flows between them. Metadata can filter or rank the information being passed forward. Heuristics can determine which candidates survive to the next stage. Structured signals can guide how the system refines its outputs. Hence, the intelligence of the system is not only in the model. It is in how information moves through the system.
Lastly, but perhaps one of the most important component, is the feedback loop. Many AI systems today generate outputs once and move on. But systems that improve over time need signals from the real world - user interactions, decisions about which outputs are actually used. These signals, and how they are used, form a feedback loop that helps the system learn what “good” looks like in practice. Over time, the system doesn’t just generate answers. It begins to align itself with human judgment.
When we started designing KLARva, the goal wasn’t just to generate content ideas. The goal was to design a system that thinks more like a strategist. Social media strategy isn’t a purely generative problem. It’s a decision problem. A strategist doesn’t just produce text, they consider signals, context, and trade-offs before deciding what direction makes sense. So the system was designed around that process.
First, the system retrieves context. This might include examples of how certain sounds are used, patterns in past videos, or signals about what is gaining traction. Instead of generating blindly, the model operates with structured context relevant to the brand and its domain.
Next, the task is broken down across multiple steps. One part of the system analyses patterns. Another proposes possible concepts. Another evaluates whether those ideas actually make sense strategically. What moves between those steps isn’t just text. The system passes structured signals: metadata about sounds, characteristics of videos, heuristics about engagement patterns, and other contextual information that helps filter and rank candidate ideas. But just as important is how that information is passed along. Each step only receives the signals it actually needs. Context is filtered, summarised, or structured before being handed to the next agent. This prevents the system from overwhelming itself with unnecessary information and keeps each stage focused on its specific role. The result is a pipeline where the system reasons in stages, instead of trying to solve everything in a single generation.
Next, evaluation. Instead of assuming that the first output is good enough, the system evaluates ideas against a set of criteria that reflect how strategists actually think: relevance to the brand, alignment with the platform, and the strength of the creative hook. Ideas that don’t meet those criteria can be refined or discarded before they ever reach the user.
Finally, there is the feedback loop. In many agent-based systems, improvement doesn’t come from a single generation. It comes from interaction over time. When an idea is proposed, it isn’t just accepted as-is. The system records what happened next - whether the idea was discarded, or scheduled. That interaction becomes part of the system’s memory of interaction history. From there, the system can refine its behaviour. One agent may propose an idea and provide its reasoning. Another component or the user evaluates the result. This creates a reason and feedback exchange that helps the system understand which directions make sense and which ones don’t. Those signals are then used in the next iteration. Through iterative refinement, the system adjusts what it proposes, how it ranks ideas, and what signals it prioritises. In many architectures, this process is guided by evaluation or reward models that score whether an output meets certain criteria. Over repeated interactions, the system gradually begins to learn user preferences, not just from static training data, but from how people actually respond to the outputs.
The result is a system that improves through use. Not by retraining the model each time, but by learning from the feedback signals generated during real workflows. The model is only one component in that loop. The real intelligence emerges from the interaction between the system and the people using it.
Building AI systems today is easier than it has ever been - frameworks are mature, models are accessible through APIs. The barrier to entry has collapsed. That’s a good thing. It means more people can experiment, build, and ship ideas. But it also means the hard part has shifted.
The challenge is no longer just connecting the model. The challenge is designing how the system should behave.
Those questions are less about engineering and more about scientific thinking (research papers are our best friend). Engineering builds the machine whereas science decides what the machine should optimise for. As AI becomes embedded in more real-world workflows, that distinction will matter more and more. Because anyone can wire up a model, but designing systems that actually think well is a different problem.
Strategy without systems doesn’t scale. Systems without strategy don’t make sense.
We build, and think, at the intersection.