AI is the hottest tech right now. We are achieving AGI-level breakthroughs at least five times a month (if not week)—at least that’s what people claim.
As an MIT student in my final semester, I find it impossible not to be asked on my take on AI every so often, or not to think about it when making decisions. But maintaining an informed, independent, and consistent perspective to AI is just hard when you have friends doing AI PhD or joining OpenAI on one side and friends who never use Copilot or call AI bullshit artists on the other. And they are all incredibly talented, and their argument well founded—so what now?
It’s a fun and intellectually fruitful exercise trying to reconcile everyone’s position against each other and come up with some “rational view on AI” I am happy with. However, the fact that the field is changing so fast means that I need to do it over and over again, at which point it stops begin fun and feels like an inefficient drain of my limited mental energy and time.
What can I do? I think it’s a good idea to take a snapshot of some of my thoughts on AI and write it down, so I can start from there instead from scratch in the future, and in three years, we shall see how right or wrong I was.
So here we are at this post. There will be no central thesis, but rather a collection of thoughts on some interesting problems I thought about. Hopefully it’s not too confusing.
Are LLMs mere stochastic parrots?
LLM skeptics often make this argument and are apt to propose many variants of this idea:
- Are LLMs simply giving the statistically most plausible answer to a prompt?
- Are LLMs just next-token predictors?
- Are LLMs basically pattern-matchers?
These opinions are not wrong per se. But upon close inspection, they are weak in supporting, as they are commonly used, skepticism to LLM’s intelligence or reliability.
LLMs are statistical models that work by next-token prediction. That’s a fact. However, consider:
LLM is not just any other statistical or probabilistic model. A simple n-gram model trained on trillions of tokens will arguably also give “statistically most plausible answer” to our query, but I doubt that it will be as good as LLMs. What do we mean by “statistically?” What do we mean by “plausible?” What distribution are we capturing and sampling on? Whether purely learned statistical models will take us to the AI we dreamed of is an entirely separate big question in its own right, but on this path, it’s fair to say that LLMs have done things “more correctly” than earlier methods.
“Next-token prediction” can be a very hard task depending on what what token means and which context you are predicting in. To draw some analogies,
- In cryptography, being able to predict the next bit in a pseudo-random number generator (PRNG) output with any probability non-negligibly better than a pure coin toss in a reasonable amount of time actually breaks the PRNG completely—and it thus presumed impossible for all serious PRNGs we use today.
- In finance, being able to predict price movement with even just 51% accuracy can be a big edge and land you (at least) a seven-figure job if you can do it reliably. Thousands of incredibly smart people in the industry are working day and night on this.
They are all “next-___ prediction” tasks, but they are by no means easy, so discrediting LLMs by calling “next-token predictors” doesn’t work.
These arguments have been out there years ago, at least tracing back to when language models were not so big, and then LLMs prospered regardless. I am not dismissing these arguments—chances are that with proper technical interpretation, these arguments predict a wall of LLM capability ahead of us. But this possibility can’t be proven or dis-proven meaningfully until we’ve hit the wall, or we’ve gone past it.
Generally, I think these arguments are all correct, but the devil is in the technical details which, unfortunately, often gets swept under the carpet in contexts where these arguments are raised. Without the details, these arguments become over-simplifying trivializers—factually correct, rhetorically convincing, but void of meaningful insights.
Are LLMs AGI/ASI/human-level AI?
Now, coming to the other side of the debate. Will LLMs take us to the moon? I think this is a very overloaded question and it’s better to disentangle it to smaller parts.
Are LLMs intelligent/sentient?
I don’t think this question, without context, has any substance outside of the field of philosophy, just because there isn’t a clear definition of intelligence or sentience and there likely won’t be. A professor I highly respect once made the point that “intelligence” is a word we reserve for things we don’t fully understand. ELIZA, a rule-based computer therapist in the 60s, was once believed to be sentient by its users not knowing how it works despite explanation given by its programmers who believed otherwise11The professor whom I quoted was also around when ELIZA came out. His thoughts: “Joe [ELIZA’s creator] told me that what he really learned from the ELIZA experience is that the Turing test is too easy: people were too easily fooled by his really dumb program!”. So I believe “intelligence” and “sentience” are really subjective. If objective definitions exist, their definitions begs on their undefinability and can only be incrementally refined through elimination over time.
That’s a lot of words trying to digress and dodge the question! Gonna answer to the point.
At a very high level, modern LLMs systems work by rolling a fixed “mindset” over a sliding window of “internal monologue.” I believe this is different from what humans do:
- It’s still debatable if we truly think with internal monologues. Yann LeCun is famously against the idea.
- The split between fixed and variable state in LLMs is clear-cut: the weight is fixed and the only variable thing is the window of tokens that LLMs operate on. A similar boundary, if it exists at all, is extremely blurred in human brains.
Hence, if “intelligent” means “intelligent as we humans are,” then I believe the answer is no. That’s still not to say that LLMs are not “intelligent” or “sentient”—they might just be just different, which brings me to the second question in the series.
Do LLMs have super-human capabilities?
Absolutely yes! Consider:
- No human can comprehend millions of tokens worth of text in a fraction of a second.
- No human can have their mind snapshot, cloned, and serving millions of users worldwide in parallel.
- No human can know a bit of everything on the Internet—or even hallucinate plausible answers to any random query you throw at them straightaway.
- No human can show you 100% transparently what’s going through their mind when they think.
I believe many of these super-human capabilities are rooted precisely in LLMs’ differences from human intelligence. Fixed weights admit high reproducibility and more importantly, a high degree of data reuse when served at scale, making memory less of a bottleneck and improving hardware FLOPs utilization; LLMs working on explicit tokens and having a transparent chain of thought allows its use to be audited and monitored. This reproducibility, efficiency at scale, and transparency would be hard if we had something that 100% replicates human thinking.
I intentionally used the wording “super-human capabilities” to sound close to the now-frequent comparisons between AI and human. These comparisons sound unprecedented and exciting under the now prevailing narrative that we are creating something that is going to beat human at our own game. In that narrative, LLMs as they are now demonstrating these super-human capabilities sound like a big deal—has the time already come?
If we look closer, though, the super-human capabilities LLMs demonstrate are typical of technologies. Ordinary computers have “super-human capabilities” too—ridiculously fast at crunching numbers, large and transparent memory, and cheap to manufacture at scale. These do not sound too different from the capabilities of LLMs. In some sense, I believe we are tempted think that LLMs are unlike any other technologies on Earth because early technologies like computers were never put under the same narrative. Or, if they were, it was too long ago for us gen Z kids and the technology has become too ubiquitous that we take for granted.
So AGI/ASI/human-level AI when?
I am not a big fan of the prevailing narrative where LLMs are to be compared with human intelligence in general with the assumption that if LLMs take off, most if not all human intellectual work will be replaced. Calling everything AI played a role in advancing this narrative too, and this is why I have been careful in my choice of acronyms so far. “Artificial intelligence” sounds like a serious competitor to “natural/human intelligence” for intellectual work, whereas “large language model” sounds like an ordinary new tech people use. The former is obviously better for marketing purposes.
Business leaders and tech visionaries are pushing “AI”—which is currently mostly LLM—onto a somewhat religious position, a “God,” as Mistral’s CEO said. AGI/ASI/(super-)human-level AI, whichever each company is heading toward, does not have a clear, stable definition. The vagueness is intentionally there, because
- tech leaders have great visions but the future is naturally unpredictable, and
- it helps their business. When people are presented with vague ideals without concrete technical details they hallucinate22I do believe LLMs are not the only ones that hallucinate, and the meaning of the verb when targeting LLM seems a great fit in this context. to their advantage, which means new use cases optimistically evaluated and technological limitations overlooked.
Both are totally reasonable points: the uncertainty and difficulty of technological innovations justify hypes to attract capital so all potential paths can be explored. But it also means we have to be cautious when taking these ideals at face value.
AI might be a God, but LLM is a technology, however disruptive it is. It might be “intelligent,” and it surely beats human in many areas, but it’s a technology and bears every property we expect from a technology: it has strengths and limitations; there are use cases where it works great and there are use cases where it works not-so-great; it becomes cheaper and more accessible over time; it will hit a point of diminishing return and plateau until the next technology comes…
With that in mind, I am not one of the most bullish AI hypers that believe AGI (whatever that means) will be achieved in the next three or five years. Many fervent believers of near-term AGI without vested interest33By no vested interest, I mean they are not Silicon Valley tech bros boosting their startups’ valuations. A good place to find many such people is r/singularity. I know of base their optimism on AI as a black box, and not much of that argument concretely relates to the specificities of LLMs. This implies that if they were thrown back five decades in history, to the midst of the wave of symbolic / expert systems AI, their optimism would still hold. People were indeed very optimistic about symbolic AI back then—until they weren’t and the AI winter came.
The AI boom also reminds me of a boom in theoretical physics in the middle of the 20th century where many new theories emerged and physicists were confident that we were on a fast track to a theory of everything. That did not happen and that, to my limited physics knowledge, is due to hitting a wall on a physicists’ version of the scaling law: the higher energy level you can do your experiments at, the more “fundamental” your physics theories can be verified. Physicists today seem to have good hypotheses and can predict what energy level it takes to verify them—they just can’t build the experiment in real world. So scaling laws don’t magically guarantee progress.
It’s possible that the industry end up pivoting to something very different in this AI boom which turn AI hyper’s optimism to a reality. By the time it happens we shall all look back and laugh at my short-sightedness. That said, one can always wishfully bet that “people will figure things out,” and be more often right than wrong, as long as our civilization keeps progressing—but that doesn’t make them smarter. Optimists of LLMs should aim specialize their arguments to the technology. For now, the generality of these claim and disconnect from technical reality make them unconvincing to me.
I was about to write more into the technical details, but I just procrastinated too much on this post: I started writing on Feb 17, and it took 20 days for me to get to where we are. Claude 3.7 Sonnet and GPT 4.5 also dropped while I was writing, and I’m glad that they do not outright invalidate any of my arguments here.
Anyways, a snapshot is not meant to be a running record, so I am publishing what I have at hand for now, the rest will likely be a follow-up post.
The professor whom I quoted was also around when ELIZA came out. His thoughts: “Joe [ELIZA’s creator] told me that what he really learned from the ELIZA experience is that the Turing test is too easy: people were too easily fooled by his really dumb program!”↩︎
I do believe LLMs are not the only ones that hallucinate, and the meaning of the verb when targeting LLM seems a great fit in this context.↩︎
By no vested interest, I mean they are not Silicon Valley tech bros boosting their startups’ valuations. A good place to find many such people is r/singularity.↩︎