It goes without saying that the breathtaking progress in deep learning and generative models has brought us to a kind of revolutionary precipice. To quote Microsoft Research, “GPT-4 shows sparks of an early, still yet incomplete, version of an artificial general intelligence system.” This begs the question– what can’t GPT-4 do yet, or speculatively what won’t the future GPT-n be able to do? In this blog post, we will explore a framework known as System 1 and System 2 thinking in the Psychology literature in order to help us understand the current weaknesses of LLMs and lay out a path to begin to address them.
GPT-4 and all the state-of-the-art LLMs struggle with factual accuracy, logical reasoning, long-term planning, issues pertaining to fairness and bias, concerns about safety and harm. These issues have not gone unnoticed in the field, and there is a rich and proliferating body of on-going research which seeks to tackle them. In humans, such abilities are achieved via a more deliberate kind of thinking process that many psychologists have referred to as “System 2.” Because of this parallel, it has become fashionable in the ML and AI research community to refer to the missing competencies of LLMs as “System 2.”
Yann LeCun recently gave a keynote talk,“Towards Machines that can Learn, Reason, and Plan”, on building System 2 capabilities at the Impact of GPT seminar at MIT. Similarly, Yoshua Bengio is leading a System 2 deep learning research program that focuses on building models that generalize better to novel situations. System 2 has become a catch-all phrase to describe these bold attempts to address the toughest problems facing today's AI. This is reminiscent of the AI effect: “AI is whatever computers can’t do yet.” Today’s insider version of the AI effect is: “System 2 is everything AI cannot do yet.”
It is important, however, to note that much of what has been attributed to System 2 is not actually consistent with the term’s definition, as described by Daniel Kahneman in his 2011 book, Thinking, Fast and Slow. Kahneman’s definition requires mechanisms that are conscious, effortful, and infrequent. Kahneman himself summarized this to AI researchers at the AAAI 2020 panel with Turing Award Winners Geoff Hinton, Yoshua Bengio, and Yann LeCun, which was perhaps how this term made it into today’s AI discourse. In the 2020 panel, paraphrasing Kahneman:
The Turing Award trinity were using System 1 and 2 inaccurately
System 2 describes serial mechanisms, invoked when System 1 cannot find an answer using parallel mechanisms
Since 2020, the hunger that AI scientists have to understand and explore “everything AI cannot do yet” has grown exponentially, however there have been no serious efforts to take on the program that Kahneman suggested three years ago. We believe that Kahneman’s System 2 is what will best serve as a blueprint for the next generation of architectures for AI. We further believe that in order to create this blueprint, we must revisit the major conclusions obtained over decades of research by Tversky and Kahneman, and that there are lessons therein which can guide and assist the engineering of tomorrow's systems.
What does System 2 mean in Psychology?
William James first hypothesized a dual process theory in the late 19th century. He posited that there are two different kinds of thinking: associative and true reasoning. This concept was developed over a century, and came on to be associated with unconscious and conscious thought, then sharpened by Wason and Evans, 1974, Stanovich and West, 2000, and finally popularized by Daniel Kahneman as System 1 and System 2.
From Kahneman’s book Thinking, Fast and Slow:
“System 1 and System 2 are so central to the story I tell in this book that I must make it absolutely clear that they are fictitious characters. Systems 1 and 2 are not systems in the standard sense of entities with interacting aspects or parts. And there is no one part of the brain that either of the systems would call home.”
Aesop's fable version of the slow and fast duality, courtesy of the Jackfield Tile Museum
So this duality is a functional abstraction, i.e., activities, constraints, properties of each system. In software engineering parlance, Kahneman goes further with providing us the requirements:
“System 1 runs automatically and System 2 is normally in a comfortable low-effort mode, in which only a fraction of its capacity is engaged. System 1 continuously generates suggestions for System 2: impressions, intuitions, intentions, and feelings. If endorsed by System 2, impressions and intuitions turn into beliefs, and impulses turn into voluntary actions. When all goes smoothly, which is most of the time, System 2 adopts the suggestions of System 1 with little or no modification.
When System 1 runs into difficulty, it calls on System 2 to support more detailed and specific processing that may solve the problem of the moment. System 2 is mobilized when a question arises for which System 1 does not offer an answer. System 2 is activated when an event is detected that violates the model of the world that System 1 maintains. In that world, lamps do not jump, cats do not bark, and gorillas do not cross basketball courts. System 2 is also credited with the continuous monitoring of your own behavior—the control that keeps you polite when you are angry, and alert when you are driving at night. System 2 is mobilized to increased effort when it detects an error about to be made. Remember a time when you almost blurted out an offensive remark and note how hard you worked to restore control. In summary, most of what you (your System 2) think and do originates in your System 1, but System 2 takes over when things get difficult, and it normally has the last word.
The division of labor between System 1 and System 2 is highly efficient: it minimizes effort and optimizes performance. The arrangement works well most of the time because System 1 is generally very good at what it does: its models of familiar situations are accurate, its short-term predictions are usually accurate as well, and its initial reactions to challenges are swift and generally appropriate. System 1 has biases, however, systematic errors that it is prone to make in specified circumstances. It sometimes answers easier questions than the one it was asked, and it has little understanding of logic and statistics.”
What could System 2 mean for AI research?
It is uncanny how well the characteristics of System 1 map to the strengths of today’s deep learning systems and, furthermore, how well their weaknesses map to the characteristics of System 2. This high alignment shows the immense opportunities in leveraging System 2-style computation for the next generation of AI research. We believe that today’s ML has laid a strong foundation for System 1, but to get to System 2, we must embrace and integrate additional computational approaches, including the use of symbols.
Before the deep learning revolution, attempts to build cognitive architectures for integrating different mechanisms put System 2-type competencies first. Indeed, 300 pages of Russell and Norvig’s well-regarded textbook are devoted to symbolic approaches to knowledge, reasoning, planning, and constraint satisfaction. The problem we face today is that, while many of these methods can provide sound inference, they are not easily mapped to language. These techniques evolved independently of the current neural approaches and were built on entirely different assumptions, hence cannot be readily leveraged by LLMs.
There is a fast-growing body of research in integrating neural and symbolic architectures. Stephen Wolfram is adding mathematical and physical reasoning to language. There are multiple calls for integrating knowledge bases and language models: Denny Vrandečić has called for integrating knowledge graphs with language models. Doug Lenat1, founder of Cyc, in his last paper argued that we must educate AI with curated pieces of explicit knowledge and rules of thumb, enabling an inference engine to automatically deduce the logical entailments of all that knowledge, which is missing from language models. Tom Dietterich, past president of AAAI and co-founder of the Journal of Machine Learning Research, in his keynote What is wrong about LLMs, and what we must be building instead, proposes a research arc starting from integrating knowledge graphs to more sophisticated knowledge representation schemes into the language model machinery. AlphaGo, the poster child of DeepMind’s demonstrations, employs a hybrid architecture where a deep policy network guides a tree search as it explores the game space. In research on self-driving cars, the need for the representation of traffic laws to be precise with no syntactic ambiguities has meant that every deployed self-driving car has a symbolic representation of traffic laws on board. The common thrust of all this research is that symbolic representations and hybrid architectures are better suited at building System 2 competencies than System 1 alone. System 2 cannot be reduced to System 1.
Recent advancements in neural networks (and in particular, generative AIs) mean that right now is the first moment in the history of AI that we can integrate System 2 mechanisms with the flexible foundation of System 1. We are researchers and engineers that want to build bridges between these different paradigms, in order to solve real world problems that are beyond the reach of any one paradigm alone. This requires being brave, as it is much easier to conduct, evaluate, and publish research within a single paradigm. In the coming posts, we will flesh out our vision of this multi-paradigm approach to building System 2 competencies.
Doug Lenat passed away battling cancer on Aug 31, 2023, while we were working on this essay. His quixotic pursuit of “symbolic representation of all consensual knowledge,” was one of the inspirations that led to billions-of-facts knowledge graphs becoming as widespread as they are today. We have just scratched the surface of how scalable and powerful his ideas are. Thank you, Doug, for teaching us by building.
“The problem we face today is that, while many of these methods can provide sound inference, they are not easily mapped to language.”
How about we model language – it is vastly more powerful than mathematical symbolism – “the river is rising and he can’t swim” – it integrates all of propositional, temporal and existential logic and relations easily.
“its short-term predictions are usually accurate as well, and its initial reactions to challenges are swift and generally appropriate.”
Don’t think so – people can do amazingly stupid things.
“proposes a research arc starting from integrating knowledge graphs”
Knowledge graphs aren’t the answer – we need undirected active knowledge structures.
You praise System 2 without pointing out its severe deficiencies.
Our conscious mind has a Four Pieces Limit – it clumps things that are beyond that limit, and can make a whole string of mistakes because of it. Economists are the poster children for this – they fix on an answer which sort of works at the moment, but factors that were assumed as constants drift with time – a shining example was working out whether inflation would be transitory or long lasting. There were Nobel laureates on both sides – it sounded more like “thoughts and prayers” than analysis. Another example is a group of specialists with a specification – a lawyer, an avionics expert, a logistics expert. They don’t understand what the other is saying, and make billion dollar stuffups. We need a machine to hold a lot more live in its head than we can. So no, we shouldn’t try to model a machine on us.